Python Forum
Calculating median value from time data series
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Calculating median value from time data series
#1
I need to calculate median value from time data array. I can manage it with numeric values but in datetime format it is a real headache. Can someone understand and explain how to do this. There's needed some data format conversions but I can't figure out how.
And which one (numpy or pandas) is more appropriate and efficient way in calculating median value?

>>> import pandas as pd
>>> import numpy as np
CREATE DATAFRAMES
>>> df1 = pd.DataFrame({'Value': [1, 2, 3]})
>>> df2 = pd.DataFrame({'Value': ['02:00:00', '03:00:00', '04:00:00']})
NUMPY NUMERIC MEDIAN
>>> numpy_numeric_median = np.median(df1)
>>> print(numpy_numeric_median)
2.0
PANDAS NUMERIC MEDIAN
>>> pandas_numeric_median = df1['Value'].median()
>>> print(pandas_numeric_median)
2.0
NUMPY TIME MEDIAN
>>> numpy_time_median = np.median(df2)
TypeError: unsupported operand type(s) for /: 'str' and 'int'

>>> df2_datetime_format = np.array(pd.to_datetime(df2['Value']), dtype=np.datetime64)
array(['2018-08-21T02:00:00.000000000', '2018-08-21T03:00:00.000000000', '2018-08-21T04:00:00.000000000'], dtype='datetime64[ns]')
>>> numpy_time_median = np.median(df2_datetime_format)
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')
PANDAS TIME MEDIAN
>>> pandas_time_median = df2['Value'].median()
TypeError: could not convert string to float: '04:00:00'

>>> df2_datetime_format = pd.to_datetime(df2['Value'])
0   2018-08-21 02:00:00
1   2018-08-21 03:00:00
2   2018-08-21 04:00:00
Name: Value, dtype: datetime64[ns]

>>> pandas_time_median = df2_datetime_format['Value'].median()
TypeError: an integer is required

>>> pandas_time_median = df2_datetime_format.median()
TypeError: reduction operation 'median' not allowed for this dtype
Reply
#2
I don't know whether or not there is some convenient way to do this, but simply sorting them and finding the middle (or 2 middles in the case of an even length sequence) is straightforward.

>>> length = len(df2_datetime_format)
>>> length
3
>>> df2_datetime_format
0   2018-08-22 02:00:00
1   2018-08-22 03:00:00
2   2018-08-22 04:00:00
Name: Value, dtype: datetime64[ns]
>>> sorted(df2_datetime_format)[length//2]
Timestamp('2018-08-22 03:00:00')
Not sure if that is sufficient to your needs.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help: Conversion of Electricity Data into Time Series Data SmallGuy 3 1,159 Oct-04-2023, 03:31 PM
Last Post: deanhystad
  Time Series Production Process Problem Mzarour 1 2,100 Feb-28-2023, 12:25 PM
Last Post: get2sid
  Finding the median of a column in a huge CSV file markagregory 5 1,733 Jan-24-2023, 04:22 PM
Last Post: DeaD_EyE
  reduce time series based on sum condition amdi40 0 1,079 Apr-06-2022, 09:09 AM
Last Post: amdi40
  How to accumulate volume of time series amdi40 3 2,260 Feb-15-2022, 02:23 PM
Last Post: amdi40
  Recommendations for ML libraries for time-series forecast AndreasPython 0 1,862 Jan-06-2021, 01:03 PM
Last Post: AndreasPython
  Find two extremum in data series Sancho_Pansa 0 1,674 Dec-04-2020, 02:06 PM
Last Post: Sancho_Pansa
  Time Series forecating with multiple independent variables Krychol88 1 1,825 Oct-23-2020, 08:11 AM
Last Post: DPaul
  how to handling time series data file with Python? aupres 4 2,928 Aug-10-2020, 12:40 PM
Last Post: MattKahn13
  How to denoise ECG Signal with median filter using WFDB for Python? fhp0223 0 2,168 Aug-05-2020, 07:10 AM
Last Post: fhp0223

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020