Python Forum
How Do I Only Get the Year from Date and Isolate Data for Year?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How Do I Only Get the Year from Date and Isolate Data for Year?
#1
Hi,

I am working with a pandas dataframe which has a date column, called Occurrence Year. I am trying to only get the year for that column (drop the month and day) so that I can pull the data that is for the year 2015.

This is what the original dataframe looks like...
[Image: Year.PNG]

And this is what I tried:
[Image: Year_split.PNG?width=820&height=406]
Which gives me NaN values in my column.

So I tried using the .apply code:
NYCrime['Year'] = NYCrime['Occurrence Year'].apply(lambda x: x[:-4])
which then gave me the error:
Error:
TypeError Traceback (most recent call last) <ipython-input-47-ef12f8aeb1c4> in <module> 2 #NYCrime['Occurrence Year'].astype(str) 3 #NYCrime['Occurrence Year'].dtype ----> 4 NYCrime['Year'] = NYCrime['Occurrence Year'].apply(lambda x: x[:-4]) 5 6 #IGNORE TCrimedf['Neighbourhood'] = TCrimedf['Neighbourhood'].str.rstrip('(0123456789)') ~/conda/envs/python/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 3846 else: 3847 values = self.astype(object).values -> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype) 3849 3850 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() <ipython-input-47-ef12f8aeb1c4> in <lambda>(x) 2 #NYCrime['Occurrence Year'].astype(str) 3 #NYCrime['Occurrence Year'].dtype ----> 4 NYCrime['Year'] = NYCrime['Occurrence Year'].apply(lambda x: x[:-4]) 5 6 #IGNORE TCrimedf['Neighbourhood'] = TCrimedf['Neighbourhood'].str.rstrip('(0123456789)') TypeError: 'float' object is not subscriptable
So I tried to use the .astype to change the values to string...
NYCrime['Occurrence Year'].astype(str)
and run the .apply code again but still got the same error.

I checked the data type of the columns (
NYCrime['Occurrence Year'].dtype
)
and got dtype('O')

Can someone help me, please? I'm at a loss at how to figure this out Confused
Reply
#2
split returns a tuple. When you did split date using '/' it returned ('12', '31', '2015'). You looked at the wrong index.

Your lambda didn't work because the date is not a string. The error message says it is a float.

I am pretty sure pandas has built in functions for extracting the information you want. Read about pandas date/time functions.
Reply
#3
Convert so Occurrence Year is a pandas date-time object,then can extract dates for 2015.
Example.
import pandas as pd
from io import StringIO

data = StringIO('''\
occurrence_year,values
10/31/2014,7.0
10/31/2015,2.0
12/31/2015,3.0
10/31/2016,9.0''')

df = pd.read_csv(data, sep=',')
print(df)
Output:
occurrence_year values 0 10/31/2014 7.0 1 10/31/2015 2.0 2 12/31/2015 3.0 3 10/31/2016 9.0
# Check types
df.dtypes

occurrence_year     object
values             float64
dtype: object

# Fix date type
df['occurrence_year'] = pd.to_datetime(df['occurrence_year'])
df.dtypes

occurrence_year    datetime64[ns]
values                    float64
dtype: object

# Extract dates for 2015
df[df['occurrence_year'].dt.year == 2015]


occurrence_year	values
1	2015-10-31	2.0
2	2015-12-31	3.0
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas dataframe: calculate metrics by year mcva 1 2,269 Mar-02-2022, 08:22 AM
Last Post: mcva
  replace nan values by mean group by date.year, date.month wissam1974 5 8,332 Feb-19-2020, 06:25 PM
Last Post: AnkitGupta
  Finding date count from a list of date range in pandas trillerducas72 0 2,720 May-24-2018, 02:30 AM
Last Post: trillerducas72
  Compare 2 Csv data sets, identify record with latest date MJUk 11 6,098 Jan-06-2018, 09:23 PM
Last Post: MJUk

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020