Posts: 170
Threads: 43
Joined: May 2019
So i have a csv file, and the data is structured like this
2022-04-25, 7:06 AM, bla bla bla
2022-04-25, 7:07 AM, ha ha ha
2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha
And so on..
I need to add a "delete" option that i can provide the date and time and have it find that line and delete it as well as everything above that line..
So if i provided:
2022-04-25, 7:07 AM
I want it to delete that line and all the lines above it so my file will end up with only
2022-04-25, 7:08 AM, ha ha ha
2022-04-25, 7:09 AM, ha ha ha
Im currently using these imports, can this be done with pandas?
import re
import pandas as pd
Posts: 6,799
Threads: 20
Joined: Feb 2020
May-05-2022, 10:12 PM
(This post was last modified: May-05-2022, 10:12 PM by deanhystad.)
You can make a new dataframe that only contains the values you want. Here I "delete" all values >= 50 and save the results in "low". You could do the same thing by comparing datetime objects.
import pandas as pd
import numpy as np
df = pd.DataFrame({"Letters":[a for a in 'ABCDEFG'], "Numbers":np.random.randint(1, 100, 7)})
low = df[df["Numbers"] < 50]
high = df[df["Numbers"] >= 50]
print("Low", low, sep="\n")
print("\nHigh", high, sep="\n") Output: Low
Letters Numbers
1 B 26
2 C 8
High
Letters Numbers
0 A 74
3 D 56
4 E 74
5 F 56
6 G 51
Posts: 1,094
Threads: 143
Joined: Jul 2017
May-06-2022, 07:45 AM
(This post was last modified: May-06-2022, 07:46 AM by Pedroski55.)
I sometimes do a similar thing in MySQL, if I forget to set the correct date first.
As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice.
def myApp():
import csv, glob
path2file = '/home/pedro/myPython/csv/'
files = glob.glob(path2file + '*.csv')
for f in files:
print('The files are', f)
myfile = input('Copy and paste a file here ... ')
with open(myfile) as infile:
my_csv_read = csv.reader(infile)
data = []
# csv reader is burned as soon as you read it, so make a list, lasts longer
for line in my_csv_read:
data.append(line)
print(line)
# working principle
x = '2022-04-25 7:07 PM'
y = '2022-04-25 7:08 AM'
if x > y:
print('x is older than y')
if x < y:
print('x is younger than y')
nums2delete = []
# start at 1 if you have a headers row
for i in range(1, len(data)):
if data[i][0] >= '2022-04-25 7:07 AM':
break
wanted = data[0:i]
Posts: 7,320
Threads: 123
Joined: Sep 2016
May-06-2022, 10:17 AM
(This post was last modified: May-06-2022, 10:18 AM by snippsat.)
(May-06-2022, 07:45 AM)Pedroski55 Wrote: As far as I know, you can compare dates directly。
Assuming your data is in ascending order, you can find the position where you want to cut off an take a slice. Not using pandas,can compare but most make the string to a datetime object first,or will just compare strings.
Also using pendulum make it easier as can parse automatically to a datetime object.
That also will handle time zones correctly.
>>> import pendulum
>>>
>>> x = '2022-04-25 7:07 PM'
>>> y = '2022-04-25 7:08 AM'
>>> x_date = pendulum.parse(x, strict=False)
>>> y_date = pendulum.parse(y, strict=False)
>>> x_date
DateTime(2022, 4, 25, 19, 7, 0, tzinfo=Timezone('UTC'))
>>>
>>> x_date < y_date
False
>>> x_date.diff_for_humans(y_date)
'11 hours after' If using pandas most also remember to covert dates to Pandas datetime.
Example:
>>> import pandas as pd
>>>
>>> d = {'Dates': ['2022-04-25 7:07 PM', '2022-04-25 7:08 AM'], 'Date_value': [111, 999]}
>>> df = pandas.DataFrame(d)
>>> df
Dates Date_value
0 2022-04-25 7:07 PM 111
1 2022-04-25 7:08 AM 999
>>>
>>> df.dtypes
Dates object
Date_value int64
dtype: object
>>>
>>> # Convert column to Pandas datetime64
>>> df['Dates'] = pd.to_datetime(df['Dates'])
>>> df.dtypes
Dates datetime64[ns]
Date_value int64
dtype: object
Posts: 170
Threads: 43
Joined: May 2019
May-06-2022, 01:40 PM
(This post was last modified: May-06-2022, 01:40 PM by cubangt.)
So this is what i currently have that works as removing any records prior to the date provided, but unclear how to add the "time" portion into the equation so that i dont have to manually find the exact date/time and remove them after the file is created.
# THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
mask = (df['Date'] > '2022-04-24')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask] What im at a loss with is, since the date and time are in separate columns, how do i adjust the mask to consider both as the criteria?
If i change the mask line to this i get a syntax error
mask = (df['Date'] > '2022-04-24') & (df['Time'] = '7:06 AM')
Posts: 6,799
Threads: 20
Joined: Feb 2020
Posts: 170
Threads: 43
Joined: May 2019
As in a virtual column? I cant mess with the original csv.. so if you mean make a new virtual column out of those 2 columns, i can do that, but how?
Posts: 170
Threads: 43
Joined: May 2019
Like this?
# THIS CLOSES THE CLEANED DATES FILE
file2.close()
# THIS OPENS THE NEWLY CLEAN DATA IN ORDER TO REMOVE OLD RECORDS
df = pd.read_csv("Dates.txt", usecols=range(3), header=None, names=["Date", "Time", "Comment"])
# THIS CONVERTS THE DATE COLUMN INTO A DATETIME FORMAT
df['Date'] = pd.to_datetime(df['Date'])
# THIS CONVERTS THE TIME COLUMN INTO A DATETIME FORMAT
df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1)
# HERE YOU NEED TO PROVIDE THE DATE YOU WANT TO KEEP GOING FORWARD
#mask = (df['Date'] > '2022-04-24')
mask = (df['DT'] > '2022-04-24 7:06 AM')
# THIS RETURNS ALL ROWS GREATER THAN THE DATE PROVIDED ABOVE
df = df.loc[mask]
Posts: 6,799
Threads: 20
Joined: Feb 2020
Posts: 170
Threads: 43
Joined: May 2019
If i run my updated code above, i get this error:
Error: df['DT'] = df.apply(lambda row: row.Date + row.Time, axis=1)
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'str'
|