Python Forum

Full Version: Calculation using group by and pandas
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I want to groupby the below data using "datetime" and calculate the 1. duration of delays and 2. count of the delays.

Here is the data:

coded_direction , delay ,datetime
1 , -0.0020 ,2016-01-01 00:00
1 , -0.0021 , 2016-01-01 00:01
1 , 0.0004 , 2016-01-01 00:01
1 , -0.0025 , 2016-01-01 00:02
1 , 0.0026 , 2016-01-01 00:02
1 , -0.0028 , 2016-01-01 00:02
1 , 0.0025 , 2016-01-01 00:03
1 , -0.0025 , 2016-01-01 00:04
1 , 0.0025 , 2016-01-01 00:05

The interested results should be like this:

A0 = [2016-01-01 00:01, 2016-01-01 00:02, 2016-01-01 00:03, 2016-01-01 00:05]
A0: list of the group datetime which experience delay (delay > 0)

Then I need to remove the "1 minute delays" since the delay should be more than 1 minutes and the result would be:
A1 = [2016-01-01 00:01, 2016-01-01 00:03]
A1: A0 after removing one minute delays
And finally I will calculate the delay duration:
A2 = (2016-01-01 00:03 - 2016-01-01 00:01 = 2 minutes)
and delay count: A3 = COUNT(2016-01-01 00:03, 2016-01-01 00:01) = 1



def f(x):
    if any (x>0):
        return True
   
gb = data.groupby(['datetime'])
list = gb['delay'].apply(f)
A0 = [x for x in list if x is not None] # but I cannot get the datetimes using this method.
# I am also not sure how calculate A1, A2, and A3
Any help is very appreciated.