Python Forum

Full Version: drop rows that doesnt matched
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I'm working on data cleaning for a while (already posted some questions about this Wink ) and this one is really bit "complicated" for me(beginner :p).
So this is how my raw file looks like: csv.file
How can I drop those rows that are not in a dateformat(the column type is still an object NOT datetime64[ns]!!!)

I tried this approach :

df= pd.read_csv('file.csv', header = None)
for index, row in df.iterrows():
    if df[df[1].apply(lambda x: type(x)==int)]:
        df.drop(index, inplace=True)
but didnt work
Thks
Karlito
Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes.
>>> import pandas as pd

>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  a  1  d
1  b  2  e
2  c  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df = df.select_dtypes(exclude=['object'])
>>> df
   y
0  1
1  2
2  3
An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  4  1  d
1  5  2  e
2  6  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x     int32 # Now integer
y     int64
z    object
dtype: object
(Oct-26-2019, 11:57 AM)snippsat Wrote: [ -> ]Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes.
>>> import pandas as pd

>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  a  1  d
1  b  2  e
2  c  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df = df.select_dtypes(exclude=['object'])
>>> df
   y
0  1
1  2
2  3
An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  4  1  d
1  5  2  e
2  6  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x     int32 # Now integer
y     int64
z    object
dtype: object

Hi Thks for replying. I want to drop rows not columns ... hier ist the file file.csv

I want to drop rows that doesnt start/look like a date (although the dtypes is still an object and not dateimt64[ns])
Thks
Karlito
(Oct-26-2019, 02:14 PM)karlito Wrote: [ -> ]I want to drop rows that doesnt start/look like a date
Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd

>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
         date  value
0  12.06.2017      0
1        2003      1
2   114999999      2
3  20.06.2017      3
4   2.08.2018      4
5   554777777      5

# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0   2017-06-12
1          NaT
2          NaT
3   2017-06-20
4   2018-08-02
5          NaT
Name: date, dtype: datetime64[ns]

# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
        date  value
0 2017-06-12      0
3 2017-06-20      3
4 2018-08-02      4

# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
         date  value
0  12.06.2017      0
3  20.06.2017      3
4  02.08.2018      4
(Oct-26-2019, 04:51 PM)snippsat Wrote: [ -> ]
(Oct-26-2019, 02:14 PM)karlito Wrote: [ -> ]I want to drop rows that doesnt start/look like a date
Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd

>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
         date  value
0  12.06.2017      0
1        2003      1
2   114999999      2
3  20.06.2017      3
4   2.08.2018      4
5   554777777      5

# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0   2017-06-12
1          NaT
2          NaT
3   2017-06-20
4   2018-08-02
5          NaT
Name: date, dtype: datetime64[ns]

# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
        date  value
0 2017-06-12      0
3 2017-06-20      3
4 2018-08-02      4

# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
         date  value
0  12.06.2017      0
3  20.06.2017      3
4  02.08.2018      4

Hi Snippsat,

Thks for your help but I tried it and it doesn't work
link : error

import pandas as pd
 
df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
df

	              0	    1
0	09.05.2017 13:56	0
1	1494331179	        1
2	1494331625	        2
3	09.05.2017 14:11	3
4	944006550	        4
5	03.07.2017 16:50	5
>>> import pandas as pd

>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0   2017-05-09 13:56:00
1                   NaT
2                   NaT
3   2017-05-09 14:11:00
4                   NaT
5   2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns]
You are missing %H:%M
(Oct-28-2019, 10:55 AM)snippsat Wrote: [ -> ]
>>> import pandas as pd

>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0   2017-05-09 13:56:00
1                   NaT
2                   NaT
3   2017-05-09 14:11:00
4                   NaT
5   2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns]
You are missing %H:%M

Bonkself

Thks ... sometimes I think the mistake is too big without taking the time to analyze my code. sorry about that.