Posts: 170
Threads: 43
Joined: May 2019
May-06-2022, 02:46 PM
(This post was last modified: May-06-2022, 02:46 PM by cubangt.)
OK, i think this works.. so far the few tests dates and times have returned what i expected
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE TIME YOU WANT TO KEEP GOING FORWARD
mask = (df['Date'] > '2022-04-28 7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask] I spoke to soon..
Even though the above does return only records that are greater than the datetime provided in the mask, i just noticed that in the output file, now the date column value is the combined value, i cant have that..
I thought that combining the two columns was only for the purpose of the filtering.. i need to have the output exactly as it was prior to filtering out the older dates..
What did i do wrong?
Posts: 6,779
Threads: 20
Joined: Feb 2020
May-06-2022, 04:39 PM
(This post was last modified: May-06-2022, 04:40 PM by deanhystad.)
Create a new column which combines date and time. Remove items that are older than you threshold. Drop the datetime column you created. Write dataframe to CSV file.
Posts: 170
Threads: 43
Joined: May 2019
May-06-2022, 06:08 PM
(This post was last modified: May-06-2022, 06:11 PM by cubangt.)
So to be clear, when you say create, thats just a temp column for the purpose of the filtering correct?
Did i not do that above in post #8?
Posts: 170
Threads: 43
Joined: May 2019
Got it working.. thank you for the suggestion, i updated the code to this and produced the correct results in the updated/new csv file
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-04-28 7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
ndf = df.loc[mask]
Posts: 170
Threads: 43
Joined: May 2019
MAN i really suck at this, the above code does work for filtering out the data, but i just tried to import it into the excel file and notice that the combined column was appended to the end of the columns, so instead of a 3 column CSV i now have a 4 column CSV and thats a no go type of problem..
What is wrong with the above code that its adding that 4th column?
Posts: 6,779
Threads: 20
Joined: Feb 2020
May-06-2022, 10:33 PM
(This post was last modified: May-06-2022, 10:33 PM by deanhystad.)
You must have forgot to drop the "DT" column before writing the CSV.
Another thing you could do is create the date-time as a separate dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":np.random.randint(1, 100, 10), "B":np.random.randint(1, 100, 10)})
apb = df["A"] + df["B"] # <- This creates a new dataframe instead of adding a column to df.
df = df[apb < 100]
print(df) Output: A B
1 30 25
2 75 1
3 9 73
4 43 3
5 26 63
7 50 48
Posts: 170
Threads: 43
Joined: May 2019
It was the drop that i was missing..
# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop Added this and now works as expected. thank you
Posts: 170
Threads: 43
Joined: May 2019
Well i spoke to fast, it was working fine in my test script, but now that i implemented the changes above to my actual script, im getting the datetime column at the end even though i have the df.drop before writing the new csv file..
How can that happen? What is wrong with this now? The replacing works, the datetime filter works, but when its written to the new text file, the temp column df['DT'] is appended to the end of the 3 original columns.
Unless i opened an old text file this morning when i checked, i thought this was already working..
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), names=["Date", "Time", "Comment"])
df['Comment'] = [re.sub(r'(?:^|\W)Someone Else:(?:$|\W)','Dan: ', str(x)) for x in df['Comment']]
df['Comment'] = [re.sub(r'(?:^|\W)Elizabeth:(?:$|\W)','Ross: ', str(x)) for x in df['Comment']]
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['DT'] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['DT'] > '2022-05-02 9:32 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]
# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop
# THIS IS THE FILTERED DATA RESULTS TO IMPORT INTO EXCEL
df.to_csv(r'C:\Users\mynewfile.txt', header=None, index=None, mode='a')
Posts: 170
Threads: 43
Joined: May 2019
Added this parameter to my to_csv and it seems to be working as expected
columns=["Date", "Time", "Comment"], header=None, index=None, mode='a') ill run a few more tests on the production file and make sure.
Posts: 6,779
Threads: 20
Joined: Feb 2020
This returns the drop function. It does nothing to modify df.
# DROP TEMP COLUMN BEFORE WRITING CSV
df['DT'].drop To drop 'DT' from df you can do either of these:
df = df.drop('DT')
# or
df.drop('DT', inplace=True) Try reading the documentation. I mean really reading until you understand what it says. For me that usually takes 4-5 times through the first time I try something new.
https://pandas.pydata.org/pandas-docs/st....drop.html
|