Posts: 48
Threads: 17
Joined: Oct 2019
Oct-25-2019, 05:58 PM
(This post was last modified: Oct-25-2019, 05:58 PM by karlito.)
Hi,
I'm working on data cleaning for a while (already posted some questions about this  ) and this one is really bit "complicated" for me(beginner :p).
So this is how my raw file looks like: csv.file
How can I drop those rows that are not in a dateformat(the column type is still an object NOT datetime64[ns]!!!)
I tried this approach :
df= pd.read_csv('file.csv', header = None)
for index, row in df.iterrows():
if df[df[1].apply(lambda x: type(x)==int)]:
df.drop(index, inplace=True) but didnt work
Thks
Karlito
Posts: 7,318
Threads: 123
Joined: Sep 2016
Oct-26-2019, 11:57 AM
(This post was last modified: Oct-26-2019, 11:57 AM by snippsat.)
Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes .
>>> import pandas as pd
>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})
>>> df
x y z
0 a 1 d
1 b 2 e
2 c 3 f
>>> df.dtypes
x object
y int64
z object
dtype: object
>>> df = df.select_dtypes(exclude=['object'])
>>> df
y
0 1
1 2
2 3 An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})
>>> df
x y z
0 4 1 d
1 5 2 e
2 6 3 f
>>> df.dtypes
x object
y int64
z object
dtype: object
>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x int32 # Now integer
y int64
z object
dtype: object
Posts: 48
Threads: 17
Joined: Oct 2019
(Oct-26-2019, 11:57 AM)snippsat Wrote: Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes .
>>> import pandas as pd
>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})
>>> df
x y z
0 a 1 d
1 b 2 e
2 c 3 f
>>> df.dtypes
x object
y int64
z object
dtype: object
>>> df = df.select_dtypes(exclude=['object'])
>>> df
y
0 1
1 2
2 3 An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})
>>> df
x y z
0 4 1 d
1 5 2 e
2 6 3 f
>>> df.dtypes
x object
y int64
z object
dtype: object
>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x int32 # Now integer
y int64
z object
dtype: object
Hi Thks for replying. I want to drop rows not columns ... hier ist the file file.csv
I want to drop rows that doesnt start/look like a date (although the dtypes is still an object and not dateimt64[ns])
Thks
Karlito
Posts: 7,318
Threads: 123
Joined: Sep 2016
Oct-26-2019, 04:51 PM
(This post was last modified: Oct-26-2019, 04:51 PM by snippsat.)
(Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a date Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd
>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
date value
0 12.06.2017 0
1 2003 1
2 114999999 2
3 20.06.2017 3
4 2.08.2018 4
5 554777777 5
# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0 2017-06-12
1 NaT
2 NaT
3 2017-06-20
4 2018-08-02
5 NaT
Name: date, dtype: datetime64[ns]
# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
date value
0 2017-06-12 0
3 2017-06-20 3
4 2018-08-02 4
# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
date value
0 12.06.2017 0
3 20.06.2017 3
4 02.08.2018 4
Posts: 48
Threads: 17
Joined: Oct 2019
Oct-28-2019, 10:19 AM
(This post was last modified: Oct-28-2019, 10:20 AM by karlito.)
(Oct-26-2019, 04:51 PM)snippsat Wrote: (Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a date Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd
>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
date value
0 12.06.2017 0
1 2003 1
2 114999999 2
3 20.06.2017 3
4 2.08.2018 4
5 554777777 5
# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0 2017-06-12
1 NaT
2 NaT
3 2017-06-20
4 2018-08-02
5 NaT
Name: date, dtype: datetime64[ns]
# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
date value
0 2017-06-12 0
3 2017-06-20 3
4 2018-08-02 4
# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
date value
0 12.06.2017 0
3 20.06.2017 3
4 02.08.2018 4
Hi Snippsat,
Thks for your help but I tried it and it doesn't work
link : error
import pandas as pd
df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
df
0 1
0 09.05.2017 13:56 0
1 1494331179 1
2 1494331625 2
3 09.05.2017 14:11 3
4 944006550 4
5 03.07.2017 16:50 5
Posts: 7,318
Threads: 123
Joined: Sep 2016
>>> import pandas as pd
>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0 2017-05-09 13:56:00
1 NaT
2 NaT
3 2017-05-09 14:11:00
4 NaT
5 2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns] You are missing %H : %M
Posts: 48
Threads: 17
Joined: Oct 2019
(Oct-28-2019, 10:55 AM)snippsat Wrote: >>> import pandas as pd
>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0 2017-05-09 13:56:00
1 NaT
2 NaT
3 2017-05-09 14:11:00
4 NaT
5 2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns] You are missing %H: %M
Thks ... sometimes I think the mistake is too big without taking the time to analyze my code. sorry about that.
|