Python Forum
pandas restricting csv read to certain rows
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pandas restricting csv read to certain rows
#1
Dear Pandas Experts,

I am tryig to extract data from a .csv file that contains columns called CarId, IssueDate

import pandas as pd
    train = pd.read_csv('train.csv', index_col=False, encoding="ISO-8859-1")
The issue date is of format "mm/dd/yyyy".
I want to get only those rows that have a year between 2012 and 2016.

Can someone help with that? I have no idea how to make this efficient i.e
maybe filtering before all the data is extracted.
Reply
#2
You should look here:
https://stackoverflow.com/questions/1746...nize-dates
https://stackoverflow.com/questions/2937...-two-dates

import pandas as pd


df = pd.read_csv('dates.csv', delimiter=';', parse_dates=['date'])
print(df.dtypes)
mask =(df['date'] > pd.Timestamp(2012,1,1)) & (df['date'] < pd.Timestamp(2016,1,1))
print(df[mask])
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Hi Dead_Eye,

Many thanks for your reply.
I tried what you suggested but even though there are years in range, nothing gets extracted.

    mask =(pd.DatetimeIndex(train_df['ticket_issued_date']).year > 2012) & (pd.DatetimeIndex(train_df['ticket_issued_date']).year < 2016)
    print(train_df[mask]) # is empty

    train_df['yearcolumn'] = pd.DatetimeIndex(train_df['ticket_issued_date']).year
    print(train_df['yearcolumn'].unique())
    #output [2004 2005 2006 2007 1938 1963 1988 2008 2009 2010 2011]
Reply
#4
Can someone help with this? I wonder why the condition filters out all rows.
Reply
#5
(Dec-12-2017, 01:22 PM)metalray Wrote: Can someone help with this? I wonder why the condition filters out all rows.
As always with Pandas question is simpler to answer question if provide sample input data that can be run.

Here generate dates from 2013 to 2017.
Take out dates from 2015 to 2016.
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((60,3)))
df['date'] = pd.date_range('2013-1-1', periods=60, freq='M')
mask = (df['date'] >= '09-01-2015') & (df['date'] <= '11-30-2016')
print(df.loc[mask])
Output:
           0         1         2       date 32  0.141371  0.146690  0.569466 2015-09-30 33  0.737064  0.233668  0.873330 2015-10-31 34  0.436252  0.494779  0.774684 2015-11-30 35  0.972824  0.590000  0.590881 2015-12-31 36  0.755075  0.409778  0.429689 2016-01-31 37  0.854687  0.807047  0.369351 2016-02-29 38  0.032468  0.362019  0.601016 2016-03-31 39  0.675107  0.063326  0.681630 2016-04-30 40  0.746973  0.055325  0.998878 2016-05-31 41  0.175564  0.967490  0.873183 2016-06-30 42  0.036599  0.011311  0.980002 2016-07-31 43  0.827632  0.246903  0.251752 2016-08-31 44  0.084068  0.974440  0.013576 2016-09-30 45  0.414893  0.853023  0.154171 2016-10-31 46  0.950556  0.973976  0.890843 2016-11-30
Reply
#6
Got it. Thanks!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas read csv file in 'date/time' chunks MorganSamage 4 1,649 Feb-13-2023, 11:24 AM
Last Post: MorganSamage
Smile How to further boost the data read write speed using pandas tjk9501 1 1,231 Nov-14-2022, 01:46 PM
Last Post: jefsummers
  Pandas Dataframe Filtering based on rows mvdlm 0 1,396 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  [Pandas] Help with finding only non-matching rows LowEnd 3 3,462 Nov-12-2021, 02:34 PM
Last Post: jefsummers
  Can't read text file with pandas zinho 6 12,011 May-24-2020, 06:13 AM
Last Post: azajali43
  Read json array data by pandas vipinct 0 1,902 Apr-13-2020, 02:24 PM
Last Post: vipinct
  pandas head() not reading all rows naab 0 1,777 Apr-07-2020, 01:06 PM
Last Post: naab
  getting trailing zeros with 1 during pandas read fullstop 1 3,561 Jan-05-2020, 04:01 PM
Last Post: ichabod801
  How to add a few empty rows into a pandas dataframe python_newbie09 2 16,288 Sep-20-2019, 08:52 AM
Last Post: python_newbie09
  Read Nested JSON with pandas.io.json palo173 4 9,531 Apr-29-2019, 01:25 PM
Last Post: palo173

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020