Trying to delete rows above a specific datetime value

cubangt · (This post was last modified: May-06-2022, 02:46 PM by cubangt.)

OK, i think this works.. so far the few tests dates and times have returned what i expected

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE TIME YOU WANT TO KEEP GOING FORWARD
mask = (df['Date'] > '2022-04-28  7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

I spoke to soon..

Even though the above does return only records that are greater than the datetime provided in the mask, i just noticed that in the output file, now the date column value is the combined value, i cant have that..

I thought that combining the two columns was only for the purpose of the filtering.. i need to have the output exactly as it was prior to filtering out the older dates..

What did i do wrong?

**deanhystad** · (This post was last modified: May-06-2022, 04:40 PM by deanhystad.)

Create a new column which combines date and time. Remove items that are older than you threshold. Drop the datetime column you created. Write dataframe to CSV file.

cubangt · (This post was last modified: May-06-2022, 06:11 PM by cubangt.)

So to be clear, when you say create, thats just a temp column for the purpose of the filtering correct?
Did i not do that above in post #8?

cubangt · May-06-2022, 06:58 PM

Got it working.. thank you for the suggestion, i updated the code to this and produced the correct results in the updated/new csv file

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-04-28  7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
ndf = df.loc[mask]

cubangt · May-06-2022, 09:01 PM

MAN i really suck at this, the above code does work for filtering out the data, but i just tried to import it into the excel file and notice that the combined column was appended to the end of the columns, so instead of a 3 column CSV i now have a 4 column CSV and thats a no go type of problem..

What is wrong with the above code that its adding that 4th column?

**deanhystad** · (This post was last modified: May-06-2022, 10:33 PM by deanhystad.)

You must have forgot to drop the "DT" column before writing the CSV.

Another thing you could do is create the date-time as a separate dataframe.

import pandas as pd
import numpy as np

df = pd.DataFrame({"A":np.random.randint(1, 100, 10), "B":np.random.randint(1, 100, 10)})
apb = df["A"] + df["B"]  # <- This creates a new dataframe instead of adding a column to df.

df = df[apb < 100]
print(df)

Output:    A   B
1  30  25
2  75   1
3   9  73
4  43   3
5  26  63
7  50  48

cubangt · May-09-2022, 01:43 PM

It was the drop that i was missing..

# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop

Added this and now works as expected. thank you

cubangt · May-09-2022, 08:19 PM

Well i spoke to fast, it was working fine in my test script, but now that i implemented the changes above to my actual script, im getting the datetime column at the end even though i have the df.drop before writing the new csv file..

How can that happen? What is wrong with this now? The replacing works, the datetime filter works, but when its written to the new text file, the temp column df['DT'] is appended to the end of the 3 original columns.

Unless i opened an old text file this morning when i checked, i thought this was already working..

# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), names=["Date", "Time", "Comment"])

df['Comment'] = [re.sub(r'(?:^|\W)Someone Else:(?:$|\W)','Dan: ', str(x)) for x in df['Comment']]
df['Comment'] = [re.sub(r'(?:^|\W)Elizabeth:(?:$|\W)','Ross: ', str(x)) for x in df['Comment']]

# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])

# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-05-02  9:32 AM')

# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]

# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop

# THIS IS THE FILTERED DATA RESULTS TO IMPORT INTO EXCEL
df.to_csv(r'C:\Users\mynewfile.txt', header=None, index=None, mode='a')

cubangt · May-09-2022, 08:46 PM

Added this parameter to my to_csv and it seems to be working as expected

columns=["Date", "Time", "Comment"], header=None, index=None, mode='a')

ill run a few more tests on the production file and make sure.

**deanhystad** · May-09-2022, 08:57 PM

This returns the drop function. It does nothing to modify df.

# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop

To drop 'DT' from df you can do either of these:

df = df.drop('DT')
# or
df.drop('DT', inplace=True)

Try reading the documentation. I mean really reading until you understand what it says. For me that usually takes 4-5 times through the first time I try something new.

https://pandas.pydata.org/pandas-docs/st....drop.html

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	delete specific row of entries	jacksfrustration	3	1,326	Feb-13-2024, 11:13 PM Last Post: deanhystad
	Reading Specific Rows In a CSV File	finndude	3	1,897	Dec-13-2022, 03:19 PM Last Post: finndude
	Pymysql delete specific rows in tableview	stsxbel	2	1,881	Aug-18-2022, 09:50 AM Last Post: ibreeden
	The code I have written removes the desired number of rows, but wrong rows	Jdesi1983	0	2,163	Dec-08-2021, 04:42 AM Last Post: Jdesi1983
	[SOLVED] Delete specific characters from string lines	EnfantNicolas	4	3,284	Oct-21-2021, 11:28 AM Last Post: EnfantNicolas
	openpyxl incorrect delete rows	VladislavM	6	6,222	Jul-19-2021, 08:54 AM Last Post: VladislavM
	Pandas DataFrame combine rows by column value, where Date Rows are NULL	rhat398	0	2,735	May-04-2021, 10:51 PM Last Post: rhat398
	Indexing [::-1] to Reverse ALL 2D Array Rows, ALL 3D, 4D Array Columns & Rows Python	Jeremy7	8	10,084	Mar-02-2021, 01:54 AM Last Post: Jeremy7
	How to sort rows based on specific order	Mekala	3	3,170	Jul-31-2020, 01:01 AM Last Post: bowlofred
	How to extract specific rows and columns from a text file with Python	Farhan	0	4,167	Mar-25-2020, 09:18 PM Last Post: Farhan

Trying to delete rows above a specific datetime value

User Panel Messages

Announcements