Python Forum
Trying to delete rows above a specific datetime value
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Trying to delete rows above a specific datetime value
#1
So i have a csv file, and the data is structured like this

2022-04-25, 7:06 AM, bla bla bla
2022-04-25, 7:07 AM, ha ha ha
2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha

And so on..

I need to add a "delete" option that i can provide the date and time and have it find that line and delete it as well as everything above that line..

So if i provided:
2022-04-25, 7:07 AM

I want it to delete that line and all the lines above it so my file will end up with only

2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha

Im currently using these imports, can this be done with pandas?

import re
import pandas as pd
Reply
#2
You can make a new dataframe that only contains the values you want. Here I "delete" all values >= 50 and save the results in "low". You could do the same thing by comparing datetime objects.
import pandas as pd
import numpy as np

df = pd.DataFrame({"Letters":[a for a in 'ABCDEFG'], "Numbers":np.random.randint(1, 100, 7)})
low = df[df["Numbers"] < 50]
high = df[df["Numbers"] >= 50]
print("Low", low, sep="\n")
print("\nHigh", high, sep="\n")
Output:
Low Letters Numbers 1 B 26 2 C 8 High Letters Numbers 0 A 74 3 D 56 4 E 74 5 F 56 6 G 51
Reply
#3
I sometimes do a similar thing in MySQL, if I forget to set the correct date first.

As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.

def myApp():
    import csv, glob

    path2file = '/home/pedro/myPython/csv/'
    files = glob.glob(path2file + '*.csv')
    for f in files:
        print('The files are', f)

    myfile = input('Copy and paste a file here ... ')
    with open(myfile) as infile:
        my_csv_read = csv.reader(infile)
        data = []
        # csv reader is burned as soon as you read it, so make a list, lasts longer
        for line in my_csv_read:
            data.append(line)
            print(line)

    # working principle    
    x = '2022-04-25 7:07 PM'
    y = '2022-04-25 7:08 AM'

    if x > y:
        print('x is older than y')
    if x < y:
        print('x is younger than y')
        
    nums2delete = []
    # start at 1 if you have a headers row
   for i in range(1, len(data)):
        if data[i][0] >= '2022-04-25 7:07 AM':
            break
            
    wanted = data[0:i]
Reply
#4
(May-06-2022, 07:45 AM)Pedroski55 Wrote: As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.
Not using pandas,can compare but most make the string to a datetime object first,or will just compare strings.
Also using pendulum make it easier as can parse automatically to a datetime object.
That also will handle time zones correctly.
>>> import pendulum
>>> 
>>> x = '2022-04-25 7:07 PM'
>>> y = '2022-04-25 7:08 AM'
>>> x_date = pendulum.parse(x, strict=False)
>>> y_date = pendulum.parse(y, strict=False)
>>> x_date
DateTime(2022, 4, 25, 19, 7, 0, tzinfo=Timezone('UTC'))
>>> 
>>> x_date < y_date
False
>>> x_date.diff_for_humans(y_date)
'11 hours after' 
If using pandas most also remember to covert dates to Pandas datetime.
Example:
>>> import pandas as pd
>>> 
>>> d = {'Dates': ['2022-04-25 7:07 PM', '2022-04-25 7:08 AM'], 'Date_value': [111, 999]}
>>> df = pandas.DataFrame(d)
>>> df
                Dates  Date_value
0  2022-04-25 7:07 PM         111
1  2022-04-25 7:08 AM         999
>>> 
>>> df.dtypes
Dates         object
Date_value     int64
dtype: object
>>> 
>>> # Convert column to Pandas datetime64
>>> df['Dates'] = pd.to_datetime(df['Dates'])
>>> df.dtypes
Dates         datetime64[ns]
Date_value             int64
dtype: object
Reply
#5
So this is what i currently have that works as removing any records prior to the date provided, but unclear how to add the "time" portion into the equation so that i dont have to manually find the exact date/time and remove them after the file is created.

#  THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['Date'] > '2022-04-24')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]
What im at a loss with is, since the date and time are in separate columns, how do i adjust the mask to consider both as the criteria?

If i change the mask line to this i get a syntax error

mask = (df['Date'] > '2022-04-24') & (df['Time'] = '7:06 AM')
Reply
#6
Make a datetime column
Reply
#7
As in a virtual column? I cant mess with the original csv.. so if you mean make a new virtual column out of those 2 columns, i can do that, but how?
Reply
#8
Like this?

#  THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# THIS CONVERTS THE TIME COLUMN INTO A DATETIME FORMAT
df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1)
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
#mask = (df['Date'] > '2022-04-24')
mask = (df['DT'] > '2022-04-24 7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]
Reply
#9
https://stackoverflow.com/questions/1797...hon-pandas
Reply
#10
If i run my updated code above, i get this error:

Error:
df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1) TypeError: unsupported operand type(s) for +: 'Timestamp' and 'str'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  delete specific row of entries jacksfrustration 3 326 Feb-13-2024, 11:13 PM
Last Post: deanhystad
  Reading Specific Rows In a CSV File finndude 3 940 Dec-13-2022, 03:19 PM
Last Post: finndude
  Pymysql delete specific rows in tableview stsxbel 2 1,049 Aug-18-2022, 09:50 AM
Last Post: ibreeden
  The code I have written removes the desired number of rows, but wrong rows Jdesi1983 0 1,603 Dec-08-2021, 04:42 AM
Last Post: Jdesi1983
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,144 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas
  openpyxl incorrect delete rows VladislavM 6 4,031 Jul-19-2021, 08:54 AM
Last Post: VladislavM
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 2,083 May-04-2021, 10:51 PM
Last Post: rhat398
  Indexing [::-1] to Reverse ALL 2D Array Rows, ALL 3D, 4D Array Columns & Rows Python Jeremy7 8 6,963 Mar-02-2021, 01:54 AM
Last Post: Jeremy7
  How to sort rows based on specific order Mekala 3 2,184 Jul-31-2020, 01:01 AM
Last Post: bowlofred
  How to extract specific rows and columns from a text file with Python Farhan 0 3,352 Mar-25-2020, 09:18 PM
Last Post: Farhan

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020