Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 drop rows that doesnt matched
#1
Hi,
I'm working on data cleaning for a while (already posted some questions about this Wink ) and this one is really bit "complicated" for me(beginner :p).
So this is how my raw file looks like: csv.file
How can I drop those rows that are not in a dateformat(the column type is still an object NOT datetime64[ns]!!!)

I tried this approach :

df= pd.read_csv('file.csv', header = None)
for index, row in df.iterrows():
    if df[df[1].apply(lambda x: type(x)==int)]:
        df.drop(index, inplace=True)
but didnt work
Thks
Karlito
Quote
#2
Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes.
>>> import pandas as pd

>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  a  1  d
1  b  2  e
2  c  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df = df.select_dtypes(exclude=['object'])
>>> df
   y
0  1
1  2
2  3
An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  4  1  d
1  5  2  e
2  6  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x     int32 # Now integer
y     int64
z    object
dtype: object
Quote
#3
(Oct-26-2019, 11:57 AM)snippsat Wrote: Your csv.file link dos not work.
Can do it like this,it's not type() in pandas but dtypes.
>>> import pandas as pd

>>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  a  1  d
1  b  2  e
2  c  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df = df.select_dtypes(exclude=['object'])
>>> df
   y
0  1
1  2
2  3
An other approach is to convert to correct types if that's needed.
>>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]})

>>> df
   x  y  z
0  4  1  d
1  5  2  e
2  6  3  f

>>> df.dtypes
x    object
y     int64
z    object
dtype: object

>>> df['x'] = df['x'].astype('int')
>>> df.dtypes
x     int32 # Now integer
y     int64
z    object
dtype: object

Hi Thks for replying. I want to drop rows not columns ... hier ist the file file.csv

I want to drop rows that doesnt start/look like a date (although the dtypes is still an object and not dateimt64[ns])
Thks
Karlito
Quote
#4
(Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a date
Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd

>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
         date  value
0  12.06.2017      0
1        2003      1
2   114999999      2
3  20.06.2017      3
4   2.08.2018      4
5   554777777      5

# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0   2017-06-12
1          NaT
2          NaT
3   2017-06-20
4   2018-08-02
5          NaT
Name: date, dtype: datetime64[ns]

# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
        date  value
0 2017-06-12      0
3 2017-06-20      3
4 2018-08-02      4

# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
         date  value
0  12.06.2017      0
3  20.06.2017      3
4  02.08.2018      4
Quote
#5
(Oct-26-2019, 04:51 PM)snippsat Wrote:
(Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a date
Ok,is better if you post a sample of csv file and not image.
Can not use data from a image to test with.
>>> import pandas as pd

>>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)})
>>> df
         date  value
0  12.06.2017      0
1        2003      1
2   114999999      2
3  20.06.2017      3
4   2.08.2018      4
5   554777777      5

# Make NaT of row to be dropped
>>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
0   2017-06-12
1          NaT
2          NaT
3   2017-06-20
4   2018-08-02
5          NaT
Name: date, dtype: datetime64[ns]

# Apply
>>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce')
>>> df = df.dropna()
>>> df
        date  value
0 2017-06-12      0
3 2017-06-20      3
4 2018-08-02      4

# Turn date around to original format if that's needed
>>> df['date'] = df['date'].dt.strftime('%d.%m.%Y')
>>> df
         date  value
0  12.06.2017      0
3  20.06.2017      3
4  02.08.2018      4

Hi Snippsat,

Thks for your help but I tried it and it doesn't work
link : error

import pandas as pd
 
df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
df

	              0	    1
0	09.05.2017 13:56	0
1	1494331179	        1
2	1494331625	        2
3	09.05.2017 14:11	3
4	944006550	        4
5	03.07.2017 16:50	5
Quote
#6
>>> import pandas as pd

>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0   2017-05-09 13:56:00
1                   NaT
2                   NaT
3   2017-05-09 14:11:00
4                   NaT
5   2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns]
You are missing %H:%M
karlito likes this post
Quote
#7
(Oct-28-2019, 10:55 AM)snippsat Wrote:
>>> import pandas as pd

>>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)})
>>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce')
0   2017-05-09 13:56:00
1                   NaT
2                   NaT
3   2017-05-09 14:11:00
4                   NaT
5   2017-07-03 16:50:00
Name: 0, dtype: datetime64[ns]
You are missing %H:%M

Bonkself

Thks ... sometimes I think the mistake is too big without taking the time to analyze my code. sorry about that.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Drop rows if a set of columns has a value dervast 1 364 Sep-12-2019, 04:18 PM
Last Post: sd_0912
  Drop rows from data with zero value Devilish 3 931 Dec-27-2018, 02:06 AM
Last Post: Devilish

Forum Jump:


Users browsing this thread: 1 Guest(s)