I have a dataset of indexed timeseries data in csv file format that I'm reading to a pandas dataframe, and specifying the index as the column of time entries:
The timeseries index is in format yyyy:mm:dd hh:mm:ss.ms, with freq='100ms' (ie in DSP terms, the sampling frequency is 10Hz and the period, or sampling interval is 100ms).
The specified time period ranges for checking against must be NON DATE-SPECIFIC ranges between two time bounds, in this case the period
23:00:00-07:00:00 (ie an 8h period spanning over two dates)
It is important the range checked against is time and not date-specific, as the files could have any date. I don't want to remove the date information from the index (and not sure that's even possible) as it will be useful later in the process.
WHAT HAVE I ALREADY CONSIDERED?
I have tried to create a Boolean mask for the data using timestamps, eg:
I considered identifying the dates in the datetime index for each datafile:
I have also tried to form a comparison series set using
Finally, I've had a look at pandas.Period, pandas.period_range, pandas.Timedelta and a load of other stuff in the pandas documentation! There is a lot there, and I'm only just starting out with python, let alone pandas, so could do with an experienced helping hand!
Any suggestions for forming this check?
Thanks
import pandas as pd df = pd.read_csv(filename,header = 1,index_col = 1) df.index = pd.to_datetime(df.index)This will be part of a batch processing algorithm that opens the file, checks if any part of the timeseries is within a specified time period, and then either continues with other files in the directory (if no data in the period), or carries out further processing (if in specified range).
The timeseries index is in format yyyy:mm:dd hh:mm:ss.ms, with freq='100ms' (ie in DSP terms, the sampling frequency is 10Hz and the period, or sampling interval is 100ms).
The specified time period ranges for checking against must be NON DATE-SPECIFIC ranges between two time bounds, in this case the period
23:00:00-07:00:00 (ie an 8h period spanning over two dates)
It is important the range checked against is time and not date-specific, as the files could have any date. I don't want to remove the date information from the index (and not sure that's even possible) as it will be useful later in the process.
WHAT HAVE I ALREADY CONSIDERED?
I have tried to create a Boolean mask for the data using timestamps, eg:
periodstart = pd.Timestamp('23:00:00.000') periodend = pd.Timestamp('06:59:59.900') mask = (df.index.time >= periodstart) & (df.index.time <= periodend)This doesn't work, because the timestamps insert the current date on the clock. I need the algorithm to be non-date specific, as it will be operated in a batch application on data covering many days.
I considered identifying the dates in the datetime index for each datafile:
datesinseries = pd.Series(df.index).map(lambda t: t.date()).unique()and using these to generate the timestamps, but this seems very cumbersome, indicating there is probably a much simpler way. It could also create a problem if the number of days covered in the datafiles varies beyond 2 (not likely with these data but I'd prefer not to create problems further down the road).
I have also tried to form a comparison series set using
df.index.time period = pd.DatetimeIndex(start='23:00:00',end='07:00:00',freq='100ms') mask = df.index.time in periodwhich returns a single Boolean 'False' no matter if the times are in the period specified. I think this syntax is fundamentally wrong as it treats a datetime index as if it is a list object, when is a type of array.
Finally, I've had a look at pandas.Period, pandas.period_range, pandas.Timedelta and a load of other stuff in the pandas documentation! There is a lot there, and I'm only just starting out with python, let alone pandas, so could do with an experienced helping hand!
Any suggestions for forming this check?
Thanks