is a pandas dataframe timeseries time index in a specified range (but ignoring date)?

m_lotinga · (This post was last modified: Dec-09-2016, 07:36 PM by m_lotinga.)

I have a dataset of indexed timeseries data in csv file format that I'm reading to a pandas dataframe, and specifying the index as the column of time entries:

import pandas as pd
df = pd.read_csv(filename,header = 1,index_col = 1)
df.index = pd.to_datetime(df.index)

This will be part of a batch processing algorithm that opens the file, checks if any part of the timeseries is within a specified time period, and then either continues with other files in the directory (if no data in the period), or carries out further processing (if in specified range).

The timeseries index is in format yyyy:mm:dd hh:mm:ss.ms, with freq='100ms' (ie in DSP terms, the sampling frequency is 10Hz and the period, or sampling interval is 100ms).

The specified time period ranges for checking against must be NON DATE-SPECIFIC ranges between two time bounds, in this case the period

23:00:00-07:00:00 (ie an 8h period spanning over two dates)

It is important the range checked against is time and not date-specific, as the files could have any date. I don't want to remove the date information from the index (and not sure that's even possible) as it will be useful later in the process.

WHAT HAVE I ALREADY CONSIDERED?

I have tried to create a Boolean mask for the data using timestamps, eg:

periodstart = pd.Timestamp('23:00:00.000')
periodend = pd.Timestamp('06:59:59.900')
mask = (df.index.time >= periodstart) & (df.index.time <= periodend)

This doesn't work, because the timestamps insert the current date on the clock. I need the algorithm to be non-date specific, as it will be operated in a batch application on data covering many days.

I considered identifying the dates in the datetime index for each datafile:

datesinseries = pd.Series(df.index).map(lambda t: t.date()).unique()

and using these to generate the timestamps, but this seems very cumbersome, indicating there is probably a much simpler way. It could also create a problem if the number of days covered in the datafiles varies beyond 2 (not likely with these data but I'd prefer not to create problems further down the road).

I have also tried to form a comparison series set using

df.index.time
period = pd.DatetimeIndex(start='23:00:00',end='07:00:00',freq='100ms')
mask = df.index.time in period

which returns a single Boolean 'False' no matter if the times are in the period specified. I think this syntax is fundamentally wrong as it treats a datetime index as if it is a list object, when is a type of array.

Finally, I've had a look at pandas.Period, pandas.period_range, pandas.Timedelta and a load of other stuff in the pandas documentation! There is a lot there, and I'm only just starting out with python, let alone pandas, so could do with an experienced helping hand!

Any suggestions for forming this check?

Thanks

**Yoriz** · (This post was last modified: Dec-09-2016, 08:17 PM by Yoriz.)

Can you not get the date from the obtained csv data and then set that date on the timestamps used to check between.

m_lotinga · (This post was last modified: Dec-09-2016, 09:01 PM by m_lotinga.)

Thanks for your reply.

It's possible using

datesinseries = pd.Series(df.index).map(lambda t: t.date()).unique()

but a) it seems a roundabout way to do it, and b) I'm not sure how to combine a date extracted this way with a manually-defined time in a single timestamp as bounds for the range. Any advice on this would be helpful.

Do I just combine strings?

The .unique approach returns a datetime object that isn't obvious how to combine into a timestamp object

**Yoriz** · Dec-09-2016, 09:21 PM

How about adding an hour to the timestamp and then checking the time only is between 00:00:00 and 07:59:59

import datetime
import pandas

timestamps = ('2000-01-01 02:30:45',
              '2005-01-01 23:30:45',
              '2010-01-01 7:30:45',
              '2016-01-01 16:30:45')

for time_stamp in timestamps:
    df = pandas.Timestamp(time_stamp)
    df += datetime.timedelta(hours=1)
    between = datetime.time(0, 0, 0) < df.time() < datetime.time(7, 59, 59)
    print(between)

Output:True
True
False
False

m_lotinga · Dec-12-2016, 10:51 PM

Thanks, I like this idea as it's quite practical.

One issue I'm finding is that my datetimeindex is an immutable array of datetime objects. there doesn't seem to be any simple way to convert this directly to a mutable array to do the addition.

So far I've tried copying the whole datetimeindex to a dummy array but not managed to get this to work as python seems very resistant to datetime gymnastics. I've also tried stepping through each value in the datetimeindex to convert to timestamp, which works, but I can't seem to compile the Boolean values into a single mask array to then apply back on to the index. In your example above, 'between' returns individual values at each loop iteration.

I found a class someone created on github for these types of problems, which may explain why I'm finding issues. But I'm trying to minimise dependancies!

https://github.com/tgs/nptime/blob/master/nptime.py

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[Numpy] Load date/time from .txt to 'datetime64' type.	water	4	566	Mar-01-2024, 11:16 PM Last Post: Gribouillis
	Parsing and summing time deltas (duration) onto bar + pie charts using pandas - - DRY	Drone4four	2	581	Feb-10-2024, 06:04 PM Last Post: Drone4four
	Grouping in pandas/multi-index data frame	Aleqsie	3	669	Jan-06-2024, 03:55 PM Last Post: deanhystad
	HTML Decoder pandas dataframe column	mbrown009	3	1,028	Sep-29-2023, 05:56 PM Last Post: deanhystad
	Index out of range error	standenman	0	1,092	May-22-2023, 10:35 PM Last Post: standenman
	Pandas read csv file in 'date/time' chunks	MorganSamage	4	1,698	Feb-13-2023, 11:24 AM Last Post: MorganSamage
	Use pandas to obtain cartesian product between a dataframe of int and equations?	haihal	0	1,118	Jan-06-2023, 10:53 PM Last Post: haihal
	Pandas Dataframe Filtering based on rows	mvdlm	0	1,430	Apr-02-2022, 06:39 PM Last Post: mvdlm
	Pandas dataframe: calculate metrics by year	mcva	1	2,311	Mar-02-2022, 08:22 AM Last Post: mcva
	Pandas dataframe comparing	anto5	0	1,261	Jan-30-2022, 10:21 AM Last Post: anto5

is a pandas dataframe timeseries time index in a specified range (but ignoring date)?

User Panel Messages

Announcements