DF.groupby(col).min works, mean gets a "not implemented" error

samgardner5 · Feb-29-2024, 04:20 PM

I'm running through some data visualization tutorials, and find in my installation of Anaconda on Windows and pandas the mean function doesn't seem to be working and I'm not sure why. Min() and max() both work with exactly the same line.

I don't think I've failed to import anything, so I'm pretty mystified why it should be failing. Anyone have any ideas?

I've put in a file showing the entire Jupyter notebook I'm working with (it's not long), but the line where it's failing is after creating "mpg" as a dataframe with "model_year" as one of the columns:

#!/usr/bin/env python
# coding: utf-8
# In[1]:

import pandas as pd

# In[2]:
import numpy as np

# In[7]:
mpg = pd.read_csv("mpg.csv")
# In[6]:
get_ipython().run_line_magic('matplotlib', 'inline')
# In[8]:
mpg.head()

# In[9]:
mpgy = mpg.groupby("model_year").mean()["mpg"]

The error message I get is:

Quote:---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general.<locals>.array_func(values)
1489 try:
-> 1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
1493 how,
1494 axis=data.ndim - 1,
1495 min_count=min_count,
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
1501 # try to python agg
1502 # TODO: shouldn't min_count matter?

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:959, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
958 ngroups = self.ngroups
--> 959 return cy_op.cython_operation(
960 values=values,
961 axis=axis,
962 min_count=min_count,
963 comp_ids=ids,
964 ngroups=ngroups,
965 **kwargs,
966 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:657, in WrappedCythonOp.cython_operation(self, values, axis, min_count, comp_ids, ngroups, **kwargs)
649 return self._ea_wrap_cython_operation(
650 values,
651 min_count=min_count,
(...)
654 **kwargs,
655 )
--> 657 return self._cython_op_ndim_compat(
658 values,
659 min_count=min_count,
660 ngroups=ngroups,
661 comp_ids=comp_ids,
662 mask=None,
663 **kwargs,
664 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:497, in WrappedCythonOp._cython_op_ndim_compat(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs)
495 return res.T
--> 497 return self._call_cython_op(
498 values,
499 min_count=min_count,
500 ngroups=ngroups,
501 comp_ids=comp_ids,
502 mask=mask,
503 result_mask=result_mask,
504 **kwargs,
505 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:541, in WrappedCythonOp._call_cython_op(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs)
540 out_shape = self._get_output_shape(ngroups, values)
--> 541 func = self._get_cython_function(self.kind, self.how, values.dtype, is_numeric)
542 values = self._get_cython_vals(values)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:173, in WrappedCythonOp._get_cython_function(cls, kind, how, dtype, is_numeric)
171 if "object" not in f.__signatures__:
172 # raise NotImplementedError here rather than TypeError later
--> 173 raise NotImplementedError(
174 f"function is not implemented for this dtype: "
175 f"[how->{how},dtype->{dtype_str}]"
176 )
177 return f

NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1692, in _ensure_numeric(x)
1691 try:
-> 1692 x = float(x)
1693 except (TypeError, ValueError):
1694 # e.g. "1+1j" or "foo"

ValueError: could not convert string to float: '889095?1001051008810016517515315018017017511072100888690707665696070'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1696, in _ensure_numeric(x)
1695 try:
-> 1696 x = complex(x)
1697 except ValueError as err:
1698 # e.g. "foo"

ValueError: complex() arg is a malformed string

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 mpgy = mpg.groupby("model_year").mean()["mpg"]

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1855, in GroupBy.mean(self, numeric_only, engine, engine_kwargs)
1853 return self._numba_agg_general(sliding_mean, engine_kwargs)
1854 else:
-> 1855 result = self._cython_agg_general(
1856 "mean",
1857 alt=lambda x: Series(x).mean(numeric_only=numeric_only),
1858 numeric_only=numeric_only,
1859 )
1860 return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1507, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count, **kwargs)
1503 result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
1505 return result
-> 1507 new_mgr = data.grouped_reduce(array_func)
1508 res = self._wrap_agged_manager(new_mgr)
1509 out = self._wrap_aggregated_output(res)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:1503, in BlockManager.grouped_reduce(self, func)
1499 if blk.is_object:
1500 # split on object-dtype blocks bc some columns may raise
1501 # while others do not.
1502 for sb in blk._split():
-> 1503 applied = sb.apply(func)
1504 result_blocks = extend_blocks(applied, result_blocks)
1505 else:

File ~\anaconda3\Lib\site-packages\pandas\core\internals\blocks.py:329, in Block.apply(self, func, **kwargs)
323 @final
324 def apply(self, func, **kwargs) -> list[Block]:
325 """
326 apply the function to my values; return a block if we are not
327 one
328 """
--> 329 result = func(self.values, **kwargs)
331 return self._split_op_result(result)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1503, in GroupBy._cython_agg_general.<locals>.array_func(values)
1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
(...)
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
1501 # try to python agg
1502 # TODO: shouldn't min_count matter?
-> 1503 result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
1505 return result

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1457, in GroupBy._agg_py_fallback(self, values, ndim, alt)
1452 ser = df.iloc[:, 0]
1454 # We do not get here with UDFs, so we know that our dtype
1455 # should always be preserved by the implemented aggregations
1456 # TODO: Is this exactly right; see WrappedCythonOp get_result_dtype?
-> 1457 res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
1459 if isinstance(values, Categorical):
1460 # Because we only get here with known dtype-preserving
1461 # reductions, we cast back to Categorical.
1462 # TODO: if we ever get "rank" working, exclude it here.
1463 res_values = type(values)._from_sequence(res_values, dtype=values.dtype)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:994, in BaseGrouper.agg_series(self, obj, func, preserve_dtype)
987 if len(obj) > 0 and not isinstance(obj._values, np.ndarray):
988 # we can preserve a little bit more aggressively with EA dtype
989 # because maybe_cast_pointwise_result will do a try/except
990 # with _from_sequence. NB we are assuming here that _from_sequence
991 # is sufficiently strict that it casts appropriately.
992 preserve_dtype = True
--> 994 result = self._aggregate_series_pure_python(obj, func)
996 npvalues = lib.maybe_convert_objects(result, try_float=False)
997 if preserve_dtype:

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:1015, in BaseGrouper._aggregate_series_pure_python(self, obj, func)
1012 splitter = self._get_splitter(obj, axis=0)
1014 for i, group in enumerate(splitter):
-> 1015 res = func(group)
1016 res = libreduction.extract_result(res)
1018 if not initialized:
1019 # We only do this validation on the first iteration

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1857, in GroupBy.mean.<locals>.<lambda>(x)
1853 return self._numba_agg_general(sliding_mean, engine_kwargs)
1854 else:
1855 result = self._cython_agg_general(
1856 "mean",
-> 1857 alt=lambda x: Series(x).mean(numeric_only=numeric_only),
1858 numeric_only=numeric_only,
1859 )
1860 return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11556, in NDFrame._add_numeric_operations.<locals>.mean(self, axis, skipna, numeric_only, **kwargs)
11539 @doc(
11540 _num_doc,
11541 desc="Return the mean of the values over the requested axis.",
(...)
11554 **kwargs,
11555 ):
> 11556 return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11201, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
11194 def mean(
11195 self,
11196 axis: Axis | None = 0,
(...)
11199 **kwargs,
11200 ) -> Series | float:
> 11201 return self._stat_function(
11202 "mean", nanops.nanmean, axis, skipna, numeric_only, **kwargs
11203 )

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11158, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
11154 nv.validate_stat_func((), kwargs, fname=name)
11156 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 11158 return self._reduce(
11159 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
11160 )

File ~\anaconda3\Lib\site-packages\pandas\core\series.py:4670, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4665 raise TypeError(
4666 f"Series.{name} does not allow {kwd_name}={numeric_only} "
4667 "with non-numeric dtypes."
4668 )
4669 with np.errstate(all="ignore"):
-> 4670 return op(delegate, skipna=skipna, **kwds)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:96, in disallow.__call__.<locals>._f(*args, **kwargs)
94 try:
95 with np.errstate(invalid="ignore"):
---> 96 return f(*args, **kwargs)
97 except ValueError as e:
98 # we want to transform an object array
99 # ValueError message to the more typical TypeError
100 # e.g. this is normally a disallowed function on
101 # object arrays that contain strings
102 if is_object_dtype(args[0]):

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:158, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
156 result = alt(values, axis=axis, skipna=skipna, **kwds)
157 else:
--> 158 result = alt(values, axis=axis, skipna=skipna, **kwds)
160 return result

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:421, in _datetimelike_compat.<locals>.new_func(values, axis, skipna, mask, **kwargs)
418 if datetimelike and mask is None:
419 mask = isna(values)
--> 421 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
423 if datetimelike:
424 result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:727, in nanmean(values, axis, skipna, mask)
724 dtype_count = dtype
726 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 727 the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
729 if axis is not None and getattr(the_sum, "ndim", False):
730 count = cast(np.ndarray, count)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1699, in _ensure_numeric(x)
1696 x = complex(x)
1697 except ValueError as err:
1698 # e.g. "foo"
-> 1699 raise TypeError(f"Could not convert {x} to numeric") from err
1700 return x

TypeError: Could not convert 889095?1001051008810016517515315018017017511072100888690707665696070 to numeric

**deanhystad** · Feb-29-2024, 04:53 PM

My guess is mpg.csv contains something that confuses pands.read_csv.

I can get the same error like this:

import pandas as pd

df = pd.read_csv("data.csv")

print(df)
print(df.dtypes)
print(df.groupby("model_year").mean()["mpg"])

Output:   model_year mpg
0           1   l
1           2   1
2           2   2
3           3   1
4           3   2
5           3   3
model_year     int64
mpg           object
dtype: object

Error:Traceback (most recent call last):
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1871, in _agg_py_fallback
    res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
  File "venv\lib\site-packages\pandas\core\groupby\ops.py", line 850, in agg_series
    result = self._aggregate_series_pure_python(obj, func)
  File "venv\lib\site-packages\pandas\core\groupby\ops.py", line 871, in _aggregate_series_pure_python
    res = func(group)
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 2377, in <lambda>
    alt=lambda x: Series(x).mean(numeric_only=numeric_only),
  File "venv\lib\site-packages\pandas\core\series.py", line 6221, in mean
    return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
  File "venv\lib\site-packages\pandas\core\generic.py", line 11978, in mean
    return self._stat_function(
  File "venv\lib\site-packages\pandas\core\generic.py", line 11935, in _stat_function
    return self._reduce(
  File "venv\lib\site-packages\pandas\core\series.py", line 6129, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 147, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 404, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 720, in nanmean
    the_sum = _ensure_numeric(the_sum)
  File "venv\lib\site-packages\pandas\core\nanops.py", line 1693, in _ensure_numeric
    raise TypeError(f"Could not convert string '{x}' to numeric")
TypeError: Could not convert string 'l' to numeric

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(df.groupby("model_year").mean()["mpg"])
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 2375, in mean
    result = self._cython_agg_general(
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1926, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "venv\lib\site-packages\pandas\core\internals\managers.py", line 1428, in grouped_reduce
    applied = sb.apply(func)
  File "venv\lib\site-packages\pandas\core\internals\blocks.py", line 366, in apply
    result = func(self.values, **kwargs)
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1923, in array_func
    result = self._agg_py_fallback(how, values, ndim=data.ndim, alt=alt)
  File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1875, in _agg_py_fallback
    raise type(err)(msg) from err
TypeError: agg function failed [how->mean,dtype->object]

In my example my mpg.csv has a lowercase L instead of a one. You get a slightly different error trace, but that may be due to the mpg.csv file having a slightly different error. Can you post mpg.csv?

samgardner5 · Feb-29-2024, 05:06 PM

Attaching the file -- hopefully this helps!

**deanhystad** · (This post was last modified: Feb-29-2024, 06:13 PM by deanhystad.)

Instead of this

mpgy = mpg.groupby("model_year").mean()["mpg"]

You need this

mpgy = mpg.groupby("model_year")["mpg"].mean()

The top one computes the mean for all the columns and selects the "mpg" column. That doesn't work when some columns are not numeric such as "name" in your file. The second one computes the mean of only the "mpg" column. You can see this in the example below were mean() is applied to mpg and weight.

import pandas as pd

df = pd.read_csv("mpg.csv")[["model_year", "mpg", "weight"]]
print(df.groupby("model_year").mean())

Output:                  mpg       weight
model_year
70          17.689655  3372.793103
71          21.250000  2995.428571
72          18.714286  3237.714286
73          17.100000  3419.025000
74          22.703704  2877.925926
75          20.266667  3176.800000
76          21.573529  3078.735294
77          23.375000  2997.357143
78          24.061111  2861.805556
79          25.093103  3055.344828
80          33.696552  2436.655172
81          30.334483  2522.931034
82          31.709677  2453.548387

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pandas - error when running Pycharm, but works on cmd line	zxcv101	1	1,376	Jun-18-2022, 01:09 PM Last Post: snippsat
	Have I implemented this function correctly?	naggafin	4	2,356	May-22-2022, 02:52 AM Last Post: stevendaprano
	NotImplementedError: pseudo-class is not implemented - how to Update Python to solve	apollo	1	3,107	May-16-2021, 08:03 AM Last Post: buran
	Function throws error but then works?	Milfredo	10	3,801	Sep-12-2020, 05:16 AM Last Post: Milfredo
	delete a file works but with error	Leon79	4	2,949	Jul-14-2020, 06:51 AM Last Post: snippsat
	Could I override a fully implemented method	zatlas1	2	2,418	Jun-06-2019, 02:20 AM Last Post: zatlas1
	Script works ok on windows but gives error on ubuntu	papampi	3	4,055	Oct-11-2017, 04:17 PM Last Post: papampi
	How to you find the file where a class or a function was implemented?	MisterX	4	4,192	Mar-16-2017, 09:51 AM Last Post: wavic

DF.groupby(col).min works, mean gets a "not implemented" error

User Panel Messages

Announcements