Python Forum
Trying to delete rows above a specific datetime value - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Trying to delete rows above a specific datetime value (/thread-37148.html)

Pages: 1 2


Trying to delete rows above a specific datetime value - cubangt - May-05-2022

So i have a csv file, and the data is structured like this

2022-04-25, 7:06 AM, bla bla bla
2022-04-25, 7:07 AM, ha ha ha
2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha

And so on..

I need to add a "delete" option that i can provide the date and time and have it find that line and delete it as well as everything above that line..

So if i provided:
2022-04-25, 7:07 AM

I want it to delete that line and all the lines above it so my file will end up with only

2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha

Im currently using these imports, can this be done with pandas?

import re
import pandas as pd


RE: Trying to delete rows above a specific datetime value - deanhystad - May-05-2022

You can make a new dataframe that only contains the values you want. Here I "delete" all values >= 50 and save the results in "low". You could do the same thing by comparing datetime objects.
import pandas as pd
import numpy as np

df = pd.DataFrame({"Letters":[a for a in 'ABCDEFG'], "Numbers":np.random.randint(1, 100, 7)})
low = df[df["Numbers"] < 50]
high = df[df["Numbers"] >= 50]
print("Low", low, sep="\n")
print("\nHigh", high, sep="\n")
Output:
Low Letters Numbers 1 B 26 2 C 8 High Letters Numbers 0 A 74 3 D 56 4 E 74 5 F 56 6 G 51



RE: Trying to delete rows above a specific datetime value - Pedroski55 - May-06-2022

I sometimes do a similar thing in MySQL, if I forget to set the correct date first.

As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.

def myApp():
    import csv, glob

    path2file = '/home/pedro/myPython/csv/'
    files = glob.glob(path2file + '*.csv')
    for f in files:
        print('The files are', f)

    myfile = input('Copy and paste a file here ... ')
    with open(myfile) as infile:
        my_csv_read = csv.reader(infile)
        data = []
        # csv reader is burned as soon as you read it, so make a list, lasts longer
        for line in my_csv_read:
            data.append(line)
            print(line)

    # working principle    
    x = '2022-04-25 7:07 PM'
    y = '2022-04-25 7:08 AM'

    if x > y:
        print('x is older than y')
    if x < y:
        print('x is younger than y')
        
    nums2delete = []
    # start at 1 if you have a headers row
   for i in range(1, len(data)):
        if data[i][0] >= '2022-04-25 7:07 AM':
            break
            
    wanted = data[0:i]



RE: Trying to delete rows above a specific datetime value - snippsat - May-06-2022

(May-06-2022, 07:45 AM)Pedroski55 Wrote: As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.
Not using pandas,can compare but most make the string to a datetime object first,or will just compare strings.
Also using pendulum make it easier as can parse automatically to a datetime object.
That also will handle time zones correctly.
>>> import pendulum
>>> 
>>> x = '2022-04-25 7:07 PM'
>>> y = '2022-04-25 7:08 AM'
>>> x_date = pendulum.parse(x, strict=False)
>>> y_date = pendulum.parse(y, strict=False)
>>> x_date
DateTime(2022, 4, 25, 19, 7, 0, tzinfo=Timezone('UTC'))
>>> 
>>> x_date < y_date
False
>>> x_date.diff_for_humans(y_date)
'11 hours after' 
If using pandas most also remember to covert dates to Pandas datetime.
Example:
>>> import pandas as pd
>>> 
>>> d = {'Dates': ['2022-04-25 7:07 PM', '2022-04-25 7:08 AM'], 'Date_value': [111, 999]}
>>> df = pandas.DataFrame(d)
>>> df
                Dates  Date_value
0  2022-04-25 7:07 PM         111
1  2022-04-25 7:08 AM         999
>>> 
>>> df.dtypes
Dates         object
Date_value     int64
dtype: object
>>> 
>>> # Convert column to Pandas datetime64
>>> df['Dates'] = pd.to_datetime(df['Dates'])
>>> df.dtypes
Dates         datetime64[ns]
Date_value             int64
dtype: object



RE: Trying to delete rows above a specific datetime value - cubangt - May-06-2022

So this is what i currently have that works as removing any records prior to the date provided, but unclear how to add the "time" portion into the equation so that i dont have to manually find the exact date/time and remove them after the file is created.

#  THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['Date'] > '2022-04-24')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]
What im at a loss with is, since the date and time are in separate columns, how do i adjust the mask to consider both as the criteria?

If i change the mask line to this i get a syntax error

mask = (df['Date'] > '2022-04-24') & (df['Time'] = '7:06 AM')



RE: Trying to delete rows above a specific datetime value - deanhystad - May-06-2022

Make a datetime column


RE: Trying to delete rows above a specific datetime value - cubangt - May-06-2022

As in a virtual column? I cant mess with the original csv.. so if you mean make a new virtual column out of those 2 columns, i can do that, but how?


RE: Trying to delete rows above a specific datetime value - cubangt - May-06-2022

Like this?

#  THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# THIS CONVERTS THE TIME COLUMN INTO A DATETIME FORMAT
df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1)
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
#mask = (df['Date'] > '2022-04-24')
mask = (df['DT'] > '2022-04-24 7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]



RE: Trying to delete rows above a specific datetime value - deanhystad - May-06-2022

https://stackoverflow.com/questions/17978092/combine-date-and-time-columns-using-python-pandas


RE: Trying to delete rows above a specific datetime value - cubangt - May-06-2022

If i run my updated code above, i get this error:

Error:
df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1) TypeError: unsupported operand type(s) for +: 'Timestamp' and 'str'