drop rows that doesnt matched - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: drop rows that doesnt matched (/thread-22034.html) |
drop rows that doesnt matched - karlito - Oct-25-2019 Hi, I'm working on data cleaning for a while (already posted some questions about this ) and this one is really bit "complicated" for me(beginner :p). So this is how my raw file looks like: csv.file How can I drop those rows that are not in a dateformat(the column type is still an object NOT datetime64[ns]!!!) I tried this approach : df= pd.read_csv('file.csv', header = None) for index, row in df.iterrows(): if df[df[1].apply(lambda x: type(x)==int)]: df.drop(index, inplace=True)but didnt work Thks Karlito RE: drop rows that doesnt matched - snippsat - Oct-26-2019 Your csv.file link dos not work.Can do it like this,it's not type() in pandas but dtypes .>>> import pandas as pd >>> df = pd.DataFrame({"x": ["a", "b", "c"], "y": [1, 2, 3], "z": ["d", "e", "f"]}) >>> df x y z 0 a 1 d 1 b 2 e 2 c 3 f >>> df.dtypes x object y int64 z object dtype: object >>> df = df.select_dtypes(exclude=['object']) >>> df y 0 1 1 2 2 3An other approach is to convert to correct types if that's needed. >>> df = pd.DataFrame({"x": ["4", "5", "6"], "y": [1, 2, 3], "z": ["d", "e", "f"]}) >>> df x y z 0 4 1 d 1 5 2 e 2 6 3 f >>> df.dtypes x object y int64 z object dtype: object >>> df['x'] = df['x'].astype('int') >>> df.dtypes x int32 # Now integer y int64 z object dtype: object RE: drop rows that doesnt matched - karlito - Oct-26-2019 (Oct-26-2019, 11:57 AM)snippsat Wrote: Your Hi Thks for replying. I want to drop rows not columns ... hier ist the file file.csv I want to drop rows that doesnt start/look like a date (although the dtypes is still an object and not dateimt64[ns]) Thks Karlito RE: drop rows that doesnt matched - snippsat - Oct-26-2019 (Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a dateOk,is better if you post a sample of csv file and not image. Can not use data from a image to test with. >>> import pandas as pd >>> df = pd.DataFrame({'date': ['12.06.2017', '2003', '114999999', '20.06.2017', '2.08.2018', '554777777'],'value': range(6)}) >>> df date value 0 12.06.2017 0 1 2003 1 2 114999999 2 3 20.06.2017 3 4 2.08.2018 4 5 554777777 5 # Make NaT of row to be dropped >>> pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce') 0 2017-06-12 1 NaT 2 NaT 3 2017-06-20 4 2018-08-02 5 NaT Name: date, dtype: datetime64[ns] # Apply >>> df['date'] = pd.to_datetime(df['date'], format='%d.%m.%Y', errors='coerce') >>> df = df.dropna() >>> df date value 0 2017-06-12 0 3 2017-06-20 3 4 2018-08-02 4 # Turn date around to original format if that's needed >>> df['date'] = df['date'].dt.strftime('%d.%m.%Y') >>> df date value 0 12.06.2017 0 3 20.06.2017 3 4 02.08.2018 4 RE: drop rows that doesnt matched - karlito - Oct-28-2019 (Oct-26-2019, 04:51 PM)snippsat Wrote:(Oct-26-2019, 02:14 PM)karlito Wrote: I want to drop rows that doesnt start/look like a dateOk,is better if you post a sample of csv file and not image. Hi Snippsat, Thks for your help but I tried it and it doesn't work link : error import pandas as pd df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)}) df 0 1 0 09.05.2017 13:56 0 1 1494331179 1 2 1494331625 2 3 09.05.2017 14:11 3 4 944006550 4 5 03.07.2017 16:50 5 RE: drop rows that doesnt matched - snippsat - Oct-28-2019 >>> import pandas as pd >>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)}) >>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce') 0 2017-05-09 13:56:00 1 NaT 2 NaT 3 2017-05-09 14:11:00 4 NaT 5 2017-07-03 16:50:00 Name: 0, dtype: datetime64[ns]You are missing %H : %M
RE: drop rows that doesnt matched - karlito - Oct-28-2019 (Oct-28-2019, 10:55 AM)snippsat Wrote:>>> import pandas as pd >>> df = pd.DataFrame({0: ['09.05.2017 13:56', '1494331179', '1494331625', '09.05.2017 14:11', '944006550', '03.07.2017 16:50'],1: range(6)}) >>> pd.to_datetime(df[0], format='%d.%m.%Y %H:%M', errors='coerce') 0 2017-05-09 13:56:00 1 NaT 2 NaT 3 2017-05-09 14:11:00 4 NaT 5 2017-07-03 16:50:00 Name: 0, dtype: datetime64[ns]You are missing %H Thks ... sometimes I think the mistake is too big without taking the time to analyze my code. sorry about that. |