pandas restricting csv read to certain rows - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: pandas restricting csv read to certain rows (/thread-6756.html) |
pandas restricting csv read to certain rows - metalray - Dec-06-2017 Dear Pandas Experts, I am tryig to extract data from a .csv file that contains columns called CarId, IssueDate import pandas as pd train = pd.read_csv('train.csv', index_col=False, encoding="ISO-8859-1")The issue date is of format "mm/dd/yyyy". I want to get only those rows that have a year between 2012 and 2016. Can someone help with that? I have no idea how to make this efficient i.e maybe filtering before all the data is extracted. RE: pandas restricting csv read to certain rows - DeaD_EyE - Dec-06-2017 You should look here: https://stackoverflow.com/questions/17465045/can-pandas-automatically-recognize-dates https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates import pandas as pd df = pd.read_csv('dates.csv', delimiter=';', parse_dates=['date']) print(df.dtypes) mask =(df['date'] > pd.Timestamp(2012,1,1)) & (df['date'] < pd.Timestamp(2016,1,1)) print(df[mask]) RE: pandas restricting csv read to certain rows - metalray - Dec-07-2017 Hi Dead_Eye, Many thanks for your reply. I tried what you suggested but even though there are years in range, nothing gets extracted. mask =(pd.DatetimeIndex(train_df['ticket_issued_date']).year > 2012) & (pd.DatetimeIndex(train_df['ticket_issued_date']).year < 2016) print(train_df[mask]) # is empty train_df['yearcolumn'] = pd.DatetimeIndex(train_df['ticket_issued_date']).year print(train_df['yearcolumn'].unique()) #output [2004 2005 2006 2007 1938 1963 1988 2008 2009 2010 2011] RE: pandas restricting csv read to certain rows - metalray - Dec-12-2017 Can someone help with this? I wonder why the condition filters out all rows. RE: pandas restricting csv read to certain rows - snippsat - Dec-12-2017 (Dec-12-2017, 01:22 PM)metalray Wrote: Can someone help with this? I wonder why the condition filters out all rows.As always with Pandas question is simpler to answer question if provide sample input data that can be run. Here generate dates from 2013 to 2017. Take out dates from 2015 to 2016. import numpy as np import pandas as pd df = pd.DataFrame(np.random.random((60,3))) df['date'] = pd.date_range('2013-1-1', periods=60, freq='M') mask = (df['date'] >= '09-01-2015') & (df['date'] <= '11-30-2016') print(df.loc[mask])
RE: pandas restricting csv read to certain rows - metalray - Dec-16-2017 Got it. Thanks! |