Python Forum
How Do I Only Get the Year from Date and Isolate Data for Year? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How Do I Only Get the Year from Date and Isolate Data for Year? (/thread-25751.html)



How Do I Only Get the Year from Date and Isolate Data for Year? - WhatsupSmiley - Apr-10-2020

Hi,

I am working with a pandas dataframe which has a date column, called Occurrence Year. I am trying to only get the year for that column (drop the month and day) so that I can pull the data that is for the year 2015.

This is what the original dataframe looks like...
[Image: Year.PNG]

And this is what I tried:
[Image: Year_split.PNG?width=820&height=406]
Which gives me NaN values in my column.

So I tried using the .apply code:
NYCrime['Year'] = NYCrime['Occurrence Year'].apply(lambda x: x[:-4])
which then gave me the error:
Error:
TypeError Traceback (most recent call last) <ipython-input-47-ef12f8aeb1c4> in <module> 2 #NYCrime['Occurrence Year'].astype(str) 3 #NYCrime['Occurrence Year'].dtype ----> 4 NYCrime['Year'] = NYCrime['Occurrence Year'].apply(lambda x: x[:-4]) 5 6 #IGNORE TCrimedf['Neighbourhood'] = TCrimedf['Neighbourhood'].str.rstrip('(0123456789)') ~/conda/envs/python/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 3846 else: 3847 values = self.astype(object).values -> 3848 mapped = lib.map_infer(values, f, convert=convert_dtype) 3849 3850 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() <ipython-input-47-ef12f8aeb1c4> in <lambda>(x) 2 #NYCrime['Occurrence Year'].astype(str) 3 #NYCrime['Occurrence Year'].dtype ----> 4 NYCrime['Year'] = NYCrime['Occurrence Year'].apply(lambda x: x[:-4]) 5 6 #IGNORE TCrimedf['Neighbourhood'] = TCrimedf['Neighbourhood'].str.rstrip('(0123456789)') TypeError: 'float' object is not subscriptable
So I tried to use the .astype to change the values to string...
NYCrime['Occurrence Year'].astype(str)
and run the .apply code again but still got the same error.

I checked the data type of the columns (
NYCrime['Occurrence Year'].dtype
)
and got dtype('O')

Can someone help me, please? I'm at a loss at how to figure this out Confused


RE: How Do I Only Get the Year from Date and Isolate Data for Year? - deanhystad - Apr-13-2020

split returns a tuple. When you did split date using '/' it returned ('12', '31', '2015'). You looked at the wrong index.

Your lambda didn't work because the date is not a string. The error message says it is a float.

I am pretty sure pandas has built in functions for extracting the information you want. Read about pandas date/time functions.


RE: How Do I Only Get the Year from Date and Isolate Data for Year? - snippsat - Apr-14-2020

Convert so Occurrence Year is a pandas date-time object,then can extract dates for 2015.
Example.
import pandas as pd
from io import StringIO

data = StringIO('''\
occurrence_year,values
10/31/2014,7.0
10/31/2015,2.0
12/31/2015,3.0
10/31/2016,9.0''')

df = pd.read_csv(data, sep=',')
print(df)
Output:
occurrence_year values 0 10/31/2014 7.0 1 10/31/2015 2.0 2 12/31/2015 3.0 3 10/31/2016 9.0
# Check types
df.dtypes

occurrence_year     object
values             float64
dtype: object

# Fix date type
df['occurrence_year'] = pd.to_datetime(df['occurrence_year'])
df.dtypes

occurrence_year    datetime64[ns]
values                    float64
dtype: object

# Extract dates for 2015
df[df['occurrence_year'].dt.year == 2015]


occurrence_year	values
1	2015-10-31	2.0
2	2015-12-31	3.0