Aug-21-2018, 06:36 PM
I need to calculate median value from time data array. I can manage it with numeric values but in datetime format it is a real headache. Can someone understand and explain how to do this. There's needed some data format conversions but I can't figure out how.
And which one (numpy or pandas) is more appropriate and efficient way in calculating median value?
And which one (numpy or pandas) is more appropriate and efficient way in calculating median value?
>>> import pandas as pd >>> import numpy as npCREATE DATAFRAMES
>>> df1 = pd.DataFrame({'Value': [1, 2, 3]}) >>> df2 = pd.DataFrame({'Value': ['02:00:00', '03:00:00', '04:00:00']})NUMPY NUMERIC MEDIAN
>>> numpy_numeric_median = np.median(df1) >>> print(numpy_numeric_median) 2.0PANDAS NUMERIC MEDIAN
>>> pandas_numeric_median = df1['Value'].median() >>> print(pandas_numeric_median) 2.0NUMPY TIME MEDIAN
>>> numpy_time_median = np.median(df2) TypeError: unsupported operand type(s) for /: 'str' and 'int' >>> df2_datetime_format = np.array(pd.to_datetime(df2['Value']), dtype=np.datetime64) array(['2018-08-21T02:00:00.000000000', '2018-08-21T03:00:00.000000000', '2018-08-21T04:00:00.000000000'], dtype='datetime64[ns]') >>> numpy_time_median = np.median(df2_datetime_format) TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')PANDAS TIME MEDIAN
>>> pandas_time_median = df2['Value'].median() TypeError: could not convert string to float: '04:00:00' >>> df2_datetime_format = pd.to_datetime(df2['Value']) 0 2018-08-21 02:00:00 1 2018-08-21 03:00:00 2 2018-08-21 04:00:00 Name: Value, dtype: datetime64[ns] >>> pandas_time_median = df2_datetime_format['Value'].median() TypeError: an integer is required >>> pandas_time_median = df2_datetime_format.median() TypeError: reduction operation 'median' not allowed for this dtype