Python Forum

Full Version: Newbie Help with Datetime formats and numPy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am just getting started on my journey of learning Python and have hit a wall trying to replicate a scenario that I commonly achieve using SAS.

I am reading in a .csv file which contains "Start" and "Complete" dates for projects. Once read in, I need to simply need to calculate the number of business days between the two dates.

Output:
example csv file. Title Start_Date Complete_Date ----- ---------- ------------- projTitle 1/19/20 1/26/20 projTitle2 2/11/20 2/15/20
Here is what I've done.

#read list csv file
parse_dates = ['Start_Date', 'Complete_Date']
df = pd.read_csv('sharepoint_requests_tst.csv', parse_dates=parse_dates)

#new column for bus day count
df['TAT'] = np.busday_count(df['Start_Date'], df['Complete_Date'])

Error I get:
TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[ns]') to dtype('<M8[D]') according to the rule 'safe'
What am I missing in this logic?

Thank you in advance for any help
I'm new too and I ask much more than I answer at this point.

However, I was messing around with dates and datetimes today because my current project makes use of them.

Dates and datetimes are different. Whether you import datetime and/or date from datetime can make a difference. There's also timedelta, which may be imported (from datetime) and can be used to add/subtract dates to get resultant time intervals. You can change from datetime to integer by taking .day attribute of a datetime object, for example.

I would suggest playing around with all these things to find exactly what may suit your needs. I'm not sure parse_dates=parse_dates is enough to cover everything.
See this SO post:

https://stackoverflow.com/questions/3191...with-numba

Essentially pandas store all datetimes in datetime64[ns] format only (i.e. down to nanoseconds), but busday_count requires datetimes in datetime64[D] format.

One option is to convert the dates to datetime64[D] format and store it as a numpy array. Something like:

import pandas as pd
import numpy as np

df = pd.read_csv('test11.csv', parse_dates = [1,2])
start_dates = np.array(df['Start_Date'].values.astype('datetime64[D]'))
complete_dates = np.array(df['Complete_Date'].values.astype('datetime64[D]'))
df['TAT'] = np.busday_count(start_dates, complete_dates)
print(df)
Output:
Title Start_Date Complete_Date TAT 0 projTitle 2020-01-19 2020-01-26 5 1 projTitle2 2020-02-11 2020-02-15 4