Calculating median value from time data series - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Calculating median value from time data series (/thread-12366.html) |
Calculating median value from time data series - mkaru - Aug-21-2018 I need to calculate median value from time data array. I can manage it with numeric values but in datetime format it is a real headache. Can someone understand and explain how to do this. There's needed some data format conversions but I can't figure out how. And which one (numpy or pandas) is more appropriate and efficient way in calculating median value? >>> import pandas as pd >>> import numpy as npCREATE DATAFRAMES >>> df1 = pd.DataFrame({'Value': [1, 2, 3]}) >>> df2 = pd.DataFrame({'Value': ['02:00:00', '03:00:00', '04:00:00']})NUMPY NUMERIC MEDIAN >>> numpy_numeric_median = np.median(df1) >>> print(numpy_numeric_median) 2.0PANDAS NUMERIC MEDIAN >>> pandas_numeric_median = df1['Value'].median() >>> print(pandas_numeric_median) 2.0NUMPY TIME MEDIAN >>> numpy_time_median = np.median(df2) TypeError: unsupported operand type(s) for /: 'str' and 'int' >>> df2_datetime_format = np.array(pd.to_datetime(df2['Value']), dtype=np.datetime64) array(['2018-08-21T02:00:00.000000000', '2018-08-21T03:00:00.000000000', '2018-08-21T04:00:00.000000000'], dtype='datetime64[ns]') >>> numpy_time_median = np.median(df2_datetime_format) TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')PANDAS TIME MEDIAN >>> pandas_time_median = df2['Value'].median() TypeError: could not convert string to float: '04:00:00' >>> df2_datetime_format = pd.to_datetime(df2['Value']) 0 2018-08-21 02:00:00 1 2018-08-21 03:00:00 2 2018-08-21 04:00:00 Name: Value, dtype: datetime64[ns] >>> pandas_time_median = df2_datetime_format['Value'].median() TypeError: an integer is required >>> pandas_time_median = df2_datetime_format.median() TypeError: reduction operation 'median' not allowed for this dtype RE: Calculating median value from time data series - Mekire - Aug-22-2018 I don't know whether or not there is some convenient way to do this, but simply sorting them and finding the middle (or 2 middles in the case of an even length sequence) is straightforward. >>> length = len(df2_datetime_format) >>> length 3 >>> df2_datetime_format 0 2018-08-22 02:00:00 1 2018-08-22 03:00:00 2 2018-08-22 04:00:00 Name: Value, dtype: datetime64[ns] >>> sorted(df2_datetime_format)[length//2] Timestamp('2018-08-22 03:00:00')Not sure if that is sufficient to your needs. |