Python Forum
drop rows that doesnt matched
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
drop rows that doesnt matched
#1
Hi,
I'm working on data cleaning for a while (already posted some questions about this Wink ) and this one is really bit "complicated" for me(beginner :p).
So this is how my raw file looks like: csv.file
How can I drop those rows that are not in a dateformat(the column type is still an object NOT datetime64[ns]!!!)

I tried this approach :

df= pd.read_csv('file.csv', header = None)
for index, row in df.iterrows():
    if df[df[1].apply(lambda x: type(x)==int)]:
        df.drop(index, inplace=True)
but didnt work
Thks
Karlito
Reply
#2
Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes.
>>> import pandas as pd

>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  a  1  d
1  b  2  e
2  c  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df = df.select_dtypes(exclude=['object'])
>>> df
   y
0  1
1  2
2  3
An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  4  1  d
1  5  2  e
2  6  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x     int32 # Now integer
y     int64
z    object
dtype: object
Reply
#3
(Oct-26-2019, 11:57 AM)snippsat Wrote: Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes.
>>> import pandas as pd

>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  a  1  d
1  b  2  e
2  c  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df = df.select_dtypes(exclude=['object'])
>>> df
   y
0  1
1  2
2  3
An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  4  1  d
1  5  2  e
2  6  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x     int32 # Now integer
y     int64
z    object
dtype: object

Hi Thks for replying. I want to drop rows not columns ... hier ist the file file.csv

I want to drop rows that doesnt start/look like a date (although the dtypes is still an object and not dateimt64[ns])
Thks
Karlito
Reply
#4
(Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a date
Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd

>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
         date  value
0  12.06.2017      0
1        2003      1
2   114999999      2
3  20.06.2017      3
4   2.08.2018      4
5   554777777      5

# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0   2017-06-12
1          NaT
2          NaT
3   2017-06-20
4   2018-08-02
5          NaT
Name: date, dtype: datetime64[ns]

# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
        date  value
0 2017-06-12      0
3 2017-06-20      3
4 2018-08-02      4

# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
         date  value
0  12.06.2017      0
3  20.06.2017      3
4  02.08.2018      4
Reply
#5
(Oct-26-2019, 04:51 PM)snippsat Wrote:
(Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a date
Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd

>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
         date  value
0  12.06.2017      0
1        2003      1
2   114999999      2
3  20.06.2017      3
4   2.08.2018      4
5   554777777      5

# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0   2017-06-12
1          NaT
2          NaT
3   2017-06-20
4   2018-08-02
5          NaT
Name: date, dtype: datetime64[ns]

# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
        date  value
0 2017-06-12      0
3 2017-06-20      3
4 2018-08-02      4

# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
         date  value
0  12.06.2017      0
3  20.06.2017      3
4  02.08.2018      4

Hi Snippsat,

Thks for your help but I tried it and it doesn't work
link : error

import pandas as pd
 
df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
df

	              0	    1
0	09.05.2017 13:56	0
1	1494331179	        1
2	1494331625	        2
3	09.05.2017 14:11	3
4	944006550	        4
5	03.07.2017 16:50	5
Reply
#6
>>> import pandas as pd

>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0   2017-05-09 13:56:00
1                   NaT
2                   NaT
3   2017-05-09 14:11:00
4                   NaT
5   2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns]
You are missing %H:%M
Reply
#7
(Oct-28-2019, 10:55 AM)snippsat Wrote:
>>> import pandas as pd

>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0   2017-05-09 13:56:00
1                   NaT
2                   NaT
3   2017-05-09 14:11:00
4                   NaT
5   2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns]
You are missing %H:%M

Bonkself

Thks ... sometimes I think the mistake is too big without taking the time to analyze my code. sorry about that.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Drop rows if a set of columns has a value dervast 1 1,951 Sep-12-2019, 04:18 PM
Last Post: sd_0912
  Drop rows from data with zero value Devilish 3 3,688 Dec-27-2018, 02:06 AM
Last Post: Devilish

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020