Trying to delete rows above a specific datetime value

cubangt · May-05-2022, 09:45 PM

So i have a csv file, and the data is structured like this

2022-04-25, 7:06 AM, bla bla bla
2022-04-25, 7:07 AM, ha ha ha
2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha

And so on..

I need to add a "delete" option that i can provide the date and time and have it find that line and delete it as well as everything above that line..

So if i provided:
2022-04-25, 7:07 AM

I want it to delete that line and all the lines above it so my file will end up with only

2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha

Im currently using these imports, can this be done with pandas?

import re
import pandas as pd

**deanhystad** · (This post was last modified: May-05-2022, 10:12 PM by deanhystad.)

You can make a new dataframe that only contains the values you want. Here I "delete" all values >= 50 and save the results in "low". You could do the same thing by comparing datetime objects.

import pandas as pd
import numpy as np

df = pd.DataFrame({"Letters":[a for a in 'ABCDEFG'], "Numbers":np.random.randint(1, 100, 7)})
low = df[df["Numbers"] < 50]
high = df[df["Numbers"] >= 50]
print("Low", low, sep="\n")
print("\nHigh", high, sep="\n")

Output:Low
  Letters  Numbers
1       B       26
2       C        8

High
  Letters  Numbers
0       A       74
3       D       56
4       E       74
5       F       56
6       G       51

Pedroski55 · (This post was last modified: May-06-2022, 07:46 AM by Pedroski55.)

I sometimes do a similar thing in MySQL, if I forget to set the correct date first.

As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.

def myApp():
    import csv, glob

    path2file = '/home/pedro/myPython/csv/'
    files = glob.glob(path2file + '*.csv')
    for f in files:
        print('The files are', f)

    myfile = input('Copy and paste a file here ... ')
    with open(myfile) as infile:
        my_csv_read = csv.reader(infile)
        data = []
        # csv reader is burned as soon as you read it, so make a list, lasts longer
        for line in my_csv_read:
            data.append(line)
            print(line)

    # working principle    
    x = '2022-04-25 7:07 PM'
    y = '2022-04-25 7:08 AM'

    if x > y:
        print('x is older than y')
    if x < y:
        print('x is younger than y')
        
    nums2delete = []
    # start at 1 if you have a headers row
   for i in range(1, len(data)):
        if data[i][0] >= '2022-04-25 7:07 AM':
            break
            
    wanted = data[0:i]

***snippsat*** · (This post was last modified: May-06-2022, 10:18 AM by snippsat.)

(May-06-2022, 07:45 AM)Pedroski55 Wrote: As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.

Not using pandas,can compare but most make the string to a datetime object first,or will just compare strings.
Also using pendulum make it easier as can parse automatically to a datetime object.
That also will handle time zones correctly.

>>> import pendulum
>>> 
>>> x = '2022-04-25 7:07 PM'
>>> y = '2022-04-25 7:08 AM'
>>> x_date = pendulum.parse(x, strict=False)
>>> y_date = pendulum.parse(y, strict=False)
>>> x_date
DateTime(2022, 4, 25, 19, 7, 0, tzinfo=Timezone('UTC'))
>>> 
>>> x_date < y_date
False
>>> x_date.diff_for_humans(y_date)
'11 hours after'

If using pandas most also remember to covert dates to Pandas datetime.
Example:

>>> import pandas as pd
>>> 
>>> d = {'Dates': ['2022-04-25 7:07 PM', '2022-04-25 7:08 AM'], 'Date_value': [111, 999]}
>>> df = pandas.DataFrame(d)
>>> df
                Dates  Date_value
0  2022-04-25 7:07 PM         111
1  2022-04-25 7:08 AM         999
>>> 
>>> df.dtypes
Dates         object
Date_value     int64
dtype: object
>>> 
>>> # Convert column to Pandas datetime64
>>> df['Dates'] = pd.to_datetime(df['Dates'])
>>> df.dtypes
Dates         datetime64[ns]
Date_value             int64
dtype: object

cubangt · (This post was last modified: May-06-2022, 01:40 PM by cubangt.)

So this is what i currently have that works as removing any records prior to the date provided, but unclear how to add the "time" portion into the equation so that i dont have to manually find the exact date/time and remove them after the file is created.

#  THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['Date'] > '2022-04-24')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

What im at a loss with is, since the date and time are in separate columns, how do i adjust the mask to consider both as the criteria?

If i change the mask line to this i get a syntax error

mask = (df['Date'] > '2022-04-24') & (df['Time'] = '7:06 AM')

**deanhystad** · May-06-2022, 01:43 PM

Make a datetime column

cubangt · May-06-2022, 01:51 PM

As in a virtual column? I cant mess with the original csv.. so if you mean make a new virtual column out of those 2 columns, i can do that, but how?

cubangt · May-06-2022, 01:58 PM

Like this?

#  THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# THIS CONVERTS THE TIME COLUMN INTO A DATETIME FORMAT
df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1)
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
#mask = (df['Date'] > '2022-04-24')
mask = (df['DT'] > '2022-04-24 7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

**deanhystad** · May-06-2022, 02:05 PM

https://stackoverflow.com/questions/1797...hon-pandas

cubangt · May-06-2022, 02:07 PM

If i run my updated code above, i get this error:

Error:    df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1)

TypeError: unsupported operand type(s) for +: 'Timestamp' and 'str'

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	delete specific row of entries	jacksfrustration	3	1,445	Feb-13-2024, 11:13 PM Last Post: deanhystad
	Reading Specific Rows In a CSV File	finndude	3	1,947	Dec-13-2022, 03:19 PM Last Post: finndude
	Pymysql delete specific rows in tableview	stsxbel	2	1,930	Aug-18-2022, 09:50 AM Last Post: ibreeden
	The code I have written removes the desired number of rows, but wrong rows	Jdesi1983	0	2,207	Dec-08-2021, 04:42 AM Last Post: Jdesi1983
	[SOLVED] Delete specific characters from string lines	EnfantNicolas	4	3,391	Oct-21-2021, 11:28 AM Last Post: EnfantNicolas
	openpyxl incorrect delete rows	VladislavM	6	6,364	Jul-19-2021, 08:54 AM Last Post: VladislavM
	Pandas DataFrame combine rows by column value, where Date Rows are NULL	rhat398	0	2,800	May-04-2021, 10:51 PM Last Post: rhat398
	Indexing [::-1] to Reverse ALL 2D Array Rows, ALL 3D, 4D Array Columns & Rows Python	Jeremy7	8	10,291	Mar-02-2021, 01:54 AM Last Post: Jeremy7
	How to sort rows based on specific order	Mekala	3	3,220	Jul-31-2020, 01:01 AM Last Post: bowlofred
	How to extract specific rows and columns from a text file with Python	Farhan	0	4,218	Mar-25-2020, 09:18 PM Last Post: Farhan

Trying to delete rows above a specific datetime value

User Panel Messages

Announcements