DF.groupby(col).min works, mean gets a "not implemented" error

samgardner5

I'm running through some data visualization tutorials, and find in my installation of Anaconda on Windows and pandas the mean function doesn't seem to be working and I'm not sure why. Min() and max() both work with exactly the same line.

I don't think I've failed to import anything, so I'm pretty mystified why it should be failing. Anyone have any ideas?

I've put in a file showing the entire Jupyter notebook I'm working with (it's not long), but the line where it's failing is after creating "mpg" as a dataframe with "model_year" as one of the columns:

#!/usr/bin/env python
# coding: utf-8
# In[1]:

import pandas as pd

# In[2]:
import numpy as np

# In[7]:
mpg = pd.read_csv("mpg.csv")
# In[6]:
get_ipython().run_line_magic('matplotlib', 'inline')
# In[8]:
mpg.head()

# In[9]:
mpgy = mpg.groupby("model_year").mean()["mpg"]

The error message I get is:

Error:---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general.<locals>.array_func(values)
   1489 try:
-> 1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   1493         how,
   1494         axis=data.ndim - 1,
   1495         min_count=min_count,
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
   1501     # try to python agg
   1502     # TODO: shouldn't min_count matter?

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:959, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
    958 ngroups = self.ngroups
--> 959 return cy_op.cython_operation(
    960     values=values,
    961     axis=axis,
    962     min_count=min_count,
    963     comp_ids=ids,
    964     ngroups=ngroups,
    965     **kwargs,
    966 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:657, in WrappedCythonOp.cython_operation(self, values, axis, min_count, comp_ids, ngroups, **kwargs)
    649     return self._ea_wrap_cython_operation(
    650         values,
    651         min_count=min_count,
   (...)
    654         **kwargs,
    655     )
--> 657 return self._cython_op_ndim_compat(
    658     values,
    659     min_count=min_count,
    660     ngroups=ngroups,
    661     comp_ids=comp_ids,
    662     mask=None,
    663     **kwargs,
    664 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:497, in WrappedCythonOp._cython_op_ndim_compat(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs)
    495     return res.T
--> 497 return self._call_cython_op(
    498     values,
    499     min_count=min_count,
    500     ngroups=ngroups,
    501     comp_ids=comp_ids,
    502     mask=mask,
    503     result_mask=result_mask,
    504     **kwargs,
    505 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:541, in WrappedCythonOp._call_cython_op(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs)
    540 out_shape = self._get_output_shape(ngroups, values)
--> 541 func = self._get_cython_function(self.kind, self.how, values.dtype, is_numeric)
    542 values = self._get_cython_vals(values)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:173, in WrappedCythonOp._get_cython_function(cls, kind, how, dtype, is_numeric)
    171 if "object" not in f.__signatures__:
    172     # raise NotImplementedError here rather than TypeError later
--> 173     raise NotImplementedError(
    174         f"function is not implemented for this dtype: "
    175         f"[how->{how},dtype->{dtype_str}]"
    176     )
    177 return f

NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1692, in _ensure_numeric(x)
   1691 try:
-> 1692     x = float(x)
   1693 except (TypeError, ValueError):
   1694     # e.g. "1+1j" or "foo"

ValueError: could not convert string to float: '889095?1001051008810016517515315018017017511072100888690707665696070'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1696, in _ensure_numeric(x)
   1695 try:
-> 1696     x = complex(x)
   1697 except ValueError as err:
   1698     # e.g. "foo"

ValueError: complex() arg is a malformed string

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 mpgy = mpg.groupby("model_year").mean()["mpg"]

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1855, in GroupBy.mean(self, numeric_only, engine, engine_kwargs)
   1853     return self._numba_agg_general(sliding_mean, engine_kwargs)
   1854 else:
-> 1855     result = self._cython_agg_general(
   1856         "mean",
   1857         alt=lambda x: Series(x).mean(numeric_only=numeric_only),
   1858         numeric_only=numeric_only,
   1859     )
   1860     return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1507, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count, **kwargs)
   1503         result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
   1505     return result
-> 1507 new_mgr = data.grouped_reduce(array_func)
   1508 res = self._wrap_agged_manager(new_mgr)
   1509 out = self._wrap_aggregated_output(res)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:1503, in BlockManager.grouped_reduce(self, func)
   1499 if blk.is_object:
   1500     # split on object-dtype blocks bc some columns may raise
   1501     #  while others do not.
   1502     for sb in blk._split():
-> 1503         applied = sb.apply(func)
   1504         result_blocks = extend_blocks(applied, result_blocks)
   1505 else:

File ~\anaconda3\Lib\site-packages\pandas\core\internals\blocks.py:329, in Block.apply(self, func, **kwargs)
    323 @final
    324 def apply(self, func, **kwargs) -> list[Block]:
    325     """
    326     apply the function to my values; return a block if we are not
    327     one
    328     """
--> 329     result = func(self.values, **kwargs)
    331     return self._split_op_result(result)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1503, in GroupBy._cython_agg_general.<locals>.array_func(values)
   1490     result = self.grouper._cython_operation(
   1491         "aggregate",
   1492         values,
   (...)
   1496         **kwargs,
   1497     )
   1498 except NotImplementedError:
   1499     # generally if we have numeric_only=False
   1500     # and non-applicable functions
   1501     # try to python agg
   1502     # TODO: shouldn't min_count matter?
-> 1503     result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
   1505 return result

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1457, in GroupBy._agg_py_fallback(self, values, ndim, alt)
   1452     ser = df.iloc[:, 0]
   1454 # We do not get here with UDFs, so we know that our dtype
   1455 #  should always be preserved by the implemented aggregations
   1456 # TODO: Is this exactly right; see WrappedCythonOp get_result_dtype?
-> 1457 res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
   1459 if isinstance(values, Categorical):
   1460     # Because we only get here with known dtype-preserving
   1461     #  reductions, we cast back to Categorical.
   1462     # TODO: if we ever get "rank" working, exclude it here.
   1463     res_values = type(values)._from_sequence(res_values, dtype=values.dtype)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:994, in BaseGrouper.agg_series(self, obj, func, preserve_dtype)
    987 if len(obj) > 0 and not isinstance(obj._values, np.ndarray):
    988     # we can preserve a little bit more aggressively with EA dtype
    989     #  because maybe_cast_pointwise_result will do a try/except
    990     #  with _from_sequence.  NB we are assuming here that _from_sequence
    991     #  is sufficiently strict that it casts appropriately.
    992     preserve_dtype = True
--> 994 result = self._aggregate_series_pure_python(obj, func)
    996 npvalues = lib.maybe_convert_objects(result, try_float=False)
    997 if preserve_dtype:

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:1015, in BaseGrouper._aggregate_series_pure_python(self, obj, func)
   1012 splitter = self._get_splitter(obj, axis=0)
   1014 for i, group in enumerate(splitter):
-> 1015     res = func(group)
   1016     res = libreduction.extract_result(res)
   1018     if not initialized:
   1019         # We only do this validation on the first iteration

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1857, in GroupBy.mean.<locals>.<lambda>(x)
   1853     return self._numba_agg_general(sliding_mean, engine_kwargs)
   1854 else:
   1855     result = self._cython_agg_general(
   1856         "mean",
-> 1857         alt=lambda x: Series(x).mean(numeric_only=numeric_only),
   1858         numeric_only=numeric_only,
   1859     )
   1860     return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11556, in NDFrame._add_numeric_operations.<locals>.mean(self, axis, skipna, numeric_only, **kwargs)
  11539 @doc(
  11540     _num_doc,
  11541     desc="Return the mean of the values over the requested axis.",
   (...)
  11554     **kwargs,
  11555 ):
> 11556     return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11201, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
  11194 def mean(
  11195     self,
  11196     axis: Axis | None = 0,
   (...)
  11199     **kwargs,
  11200 ) -> Series | float:
> 11201     return self._stat_function(
  11202         "mean", nanops.nanmean, axis, skipna, numeric_only, **kwargs
  11203     )

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11158, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
  11154     nv.validate_stat_func((), kwargs, fname=name)
  11156 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 11158 return self._reduce(
  11159     func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  11160 )

File ~\anaconda3\Lib\site-packages\pandas\core\series.py:4670, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4665     raise TypeError(
   4666         f"Series.{name} does not allow {kwd_name}={numeric_only} "
   4667         "with non-numeric dtypes."
   4668     )
   4669 with np.errstate(all="ignore"):
-> 4670     return op(delegate, skipna=skipna, **kwds)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:96, in disallow.__call__.<locals>._f(*args, **kwargs)
     94 try:
     95     with np.errstate(invalid="ignore"):
---> 96         return f(*args, **kwargs)
     97 except ValueError as e:
     98     # we want to transform an object array
     99     # ValueError message to the more typical TypeError
    100     # e.g. this is normally a disallowed function on
    101     # object arrays that contain strings
    102     if is_object_dtype(args[0]):

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:158, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
    156         result = alt(values, axis=axis, skipna=skipna, **kwds)
    157 else:
--> 158     result = alt(values, axis=axis, skipna=skipna, **kwds)
    160 return result

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:421, in _datetimelike_compat.<locals>.new_func(values, axis, skipna, mask, **kwargs)
    418 if datetimelike and mask is None:
    419     mask = isna(values)
--> 421 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
    423 if datetimelike:
    424     result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:727, in nanmean(values, axis, skipna, mask)
    724     dtype_count = dtype
    726 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 727 the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
    729 if axis is not None and getattr(the_sum, "ndim", False):
    730     count = cast(np.ndarray, count)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1699, in _ensure_numeric(x)
   1696             x = complex(x)
   1697         except ValueError as err:
   1698             # e.g. "foo"
-> 1699             raise TypeError(f"Could not convert {x} to numeric") from err
   1700 return x

TypeError: Could not convert 889095?1001051008810016517515315018017017511072100888690707665696070 to numeric

**deanhystad** · Feb-29-2024, 04:53 PM

My guess is mpg.csv contains something that confuses pands.read_csv.

I can get the same error like this:

import pandas as pd

df = pd.read_csv("data.csv")

print(df)
print(df.dtypes)
print(df.groupby("model_year").mean()["mpg"])

Output:   model_year mpg
0           1   l
1           2   1
2           2   2
3           3   1
4           3   2
5           3   3
model_year     int64
mpg           object
dtype: object

Error:Traceback (most recent call last):
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1871, in _agg_py_fallback
    res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
  File "venv\lib\site-packages\pandas\core\groupby\ops.py", line 850, in agg_series
    result = self._aggregate_series_pure_python(obj, func)
  File "venv\lib\site-packages\pandas\core\groupby\ops.py", line 871, in _aggregate_series_pure_python
    res = func(group)
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 2377, in <lambda>
    alt=lambda x: Series(x).mean(numeric_only=numeric_only),
  File "venv\lib\site-packages\pandas\core\series.py", line 6221, in mean
    return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
  File "venv\lib\site-packages\pandas\core\generic.py", line 11978, in mean
    return self._stat_function(
  File "venv\lib\site-packages\pandas\core\generic.py", line 11935, in _stat_function
    return self._reduce(
  File "venv\lib\site-packages\pandas\core\series.py", line 6129, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 147, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 404, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 720, in nanmean
    the_sum = _ensure_numeric(the_sum)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 1693, in _ensure_numeric
    raise TypeError(f"Could not convert string '{x}' to numeric")
TypeError: Could not convert string 'l' to numeric

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(df.groupby("model_year").mean()["mpg"])
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 2375, in mean
    result = self._cython_agg_general(
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1926, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "venv\lib\site-packages\pandas\core\internals\managers.py", line 1428, in grouped_reduce
    applied = sb.apply(func)
  File "venv\lib\site-packages\pandas\core\internals\blocks.py", line 366, in apply
    result = func(self.values, **kwargs)
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1923, in array_func
    result = self._agg_py_fallback(how, values, ndim=data.ndim, alt=alt)
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1875, in _agg_py_fallback
    raise type(err)(msg) from err
TypeError: agg function failed [how->mean,dtype->object]

In my example my mpg.csv has a lowercase L instead of a one. You get a slightly different error trace, but that may be due to the mpg.csv file having a slightly different error. Can you post mpg.csv?

samgardner5 · Feb-29-2024, 05:06 PM

Attaching the file -- hopefully this helps!

**deanhystad** · (This post was last modified: Feb-29-2024, 06:13 PM by deanhystad.)

Instead of this

mpgy = mpg.groupby("model_year").mean()["mpg"]

You need this

mpgy = mpg.groupby("model_year")["mpg"].mean()

The top one computes the mean for all the columns and selects the "mpg" column. That doesn't work when some columns are not numeric such as "name" in your file. The second one computes the mean of only the "mpg" column. You can see this in the example below were mean() is applied to mpg and weight.

import pandas as pd

df = pd.read_csv("mpg.csv")[["model_year", "mpg", "weight"]]
print(df.groupby("model_year").mean())

Output:                  mpg       weight
model_year
70          17.689655  3372.793103
71          21.250000  2995.428571
72          18.714286  3237.714286
73          17.100000  3419.025000
74          22.703704  2877.925926
75          20.266667  3176.800000
76          21.573529  3078.735294
77          23.375000  2997.357143
78          24.061111  2861.805556
79          25.093103  3055.344828
80          33.696552  2436.655172
81          30.334483  2522.931034
82          31.709677  2453.548387

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pandas - error when running Pycharm, but works on cmd line	zxcv101	2	2,628	Sep-09-2024, 08:03 AM Last Post: pinkang
	Have I implemented this function correctly?	naggafin	4	3,567	May-22-2022, 02:52 AM Last Post: stevendaprano
	NotImplementedError: pseudo-class is not implemented - how to Update Python to solve	apollo	1	4,070	May-16-2021, 08:03 AM Last Post: buran
	Function throws error but then works?	Milfredo	10	5,920	Sep-12-2020, 05:16 AM Last Post: Milfredo
	delete a file works but with error	Leon79	4	4,091	Jul-14-2020, 06:51 AM Last Post: snippsat
	Could I override a fully implemented method	zatlas1	2	3,173	Jun-06-2019, 02:20 AM Last Post: zatlas1
	Script works ok on windows but gives error on ubuntu	papampi	3	5,002	Oct-11-2017, 04:17 PM Last Post: papampi
	How to you find the file where a class or a function was implemented?	MisterX	4	5,291	Mar-16-2017, 09:51 AM Last Post: wavic

DF.groupby(col).min works, mean gets a "not implemented" error

User Panel Messages

Announcements