Python Forum
DF.groupby(col).min works, mean gets a "not implemented" error
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
DF.groupby(col).min works, mean gets a "not implemented" error
#1
I'm running through some data visualization tutorials, and find in my installation of Anaconda on Windows and pandas the mean function doesn't seem to be working and I'm not sure why. Min() and max() both work with exactly the same line.

I don't think I've failed to import anything, so I'm pretty mystified why it should be failing. Anyone have any ideas?

I've put in a file showing the entire Jupyter notebook I'm working with (it's not long), but the line where it's failing is after creating "mpg" as a dataframe with "model_year" as one of the columns:


#!/usr/bin/env python
# coding: utf-8
# In[1]:

import pandas as pd

# In[2]:
import numpy as np

# In[7]:
mpg = pd.read_csv("mpg.csv")
# In[6]:
get_ipython().run_line_magic('matplotlib', 'inline')
# In[8]:
mpg.head()

# In[9]:
mpgy = mpg.groupby("model_year").mean()["mpg"]
The error message I get is:
Quote:---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1490, in GroupBy._cython_agg_general.<locals>.array_func(values)
1489 try:
-> 1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
1493 how,
1494 axis=data.ndim - 1,
1495 min_count=min_count,
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
1501 # try to python agg
1502 # TODO: shouldn't min_count matter?

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:959, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
958 ngroups = self.ngroups
--> 959 return cy_op.cython_operation(
960 values=values,
961 axis=axis,
962 min_count=min_count,
963 comp_ids=ids,
964 ngroups=ngroups,
965 **kwargs,
966 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:657, in WrappedCythonOp.cython_operation(self, values, axis, min_count, comp_ids, ngroups, **kwargs)
649 return self._ea_wrap_cython_operation(
650 values,
651 min_count=min_count,
(...)
654 **kwargs,
655 )
--> 657 return self._cython_op_ndim_compat(
658 values,
659 min_count=min_count,
660 ngroups=ngroups,
661 comp_ids=comp_ids,
662 mask=None,
663 **kwargs,
664 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:497, in WrappedCythonOp._cython_op_ndim_compat(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs)
495 return res.T
--> 497 return self._call_cython_op(
498 values,
499 min_count=min_count,
500 ngroups=ngroups,
501 comp_ids=comp_ids,
502 mask=mask,
503 result_mask=result_mask,
504 **kwargs,
505 )

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:541, in WrappedCythonOp._call_cython_op(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs)
540 out_shape = self._get_output_shape(ngroups, values)
--> 541 func = self._get_cython_function(self.kind, self.how, values.dtype, is_numeric)
542 values = self._get_cython_vals(values)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:173, in WrappedCythonOp._get_cython_function(cls, kind, how, dtype, is_numeric)
171 if "object" not in f.__signatures__:
172 # raise NotImplementedError here rather than TypeError later
--> 173 raise NotImplementedError(
174 f"function is not implemented for this dtype: "
175 f"[how->{how},dtype->{dtype_str}]"
176 )
177 return f

NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1692, in _ensure_numeric(x)
1691 try:
-> 1692 x = float(x)
1693 except (TypeError, ValueError):
1694 # e.g. "1+1j" or "foo"

ValueError: could not convert string to float: '889095?1001051008810016517515315018017017511072100888690707665696070'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1696, in _ensure_numeric(x)
1695 try:
-> 1696 x = complex(x)
1697 except ValueError as err:
1698 # e.g. "foo"

ValueError: complex() arg is a malformed string

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 mpgy = mpg.groupby("model_year").mean()["mpg"]

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1855, in GroupBy.mean(self, numeric_only, engine, engine_kwargs)
1853 return self._numba_agg_general(sliding_mean, engine_kwargs)
1854 else:
-> 1855 result = self._cython_agg_general(
1856 "mean",
1857 alt=lambda x: Series(x).mean(numeric_only=numeric_only),
1858 numeric_only=numeric_only,
1859 )
1860 return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1507, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count, **kwargs)
1503 result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
1505 return result
-> 1507 new_mgr = data.grouped_reduce(array_func)
1508 res = self._wrap_agged_manager(new_mgr)
1509 out = self._wrap_aggregated_output(res)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:1503, in BlockManager.grouped_reduce(self, func)
1499 if blk.is_object:
1500 # split on object-dtype blocks bc some columns may raise
1501 # while others do not.
1502 for sb in blk._split():
-> 1503 applied = sb.apply(func)
1504 result_blocks = extend_blocks(applied, result_blocks)
1505 else:

File ~\anaconda3\Lib\site-packages\pandas\core\internals\blocks.py:329, in Block.apply(self, func, **kwargs)
323 @final
324 def apply(self, func, **kwargs) -> list[Block]:
325 """
326 apply the function to my values; return a block if we are not
327 one
328 """
--> 329 result = func(self.values, **kwargs)
331 return self._split_op_result(result)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1503, in GroupBy._cython_agg_general.<locals>.array_func(values)
1490 result = self.grouper._cython_operation(
1491 "aggregate",
1492 values,
(...)
1496 **kwargs,
1497 )
1498 except NotImplementedError:
1499 # generally if we have numeric_only=False
1500 # and non-applicable functions
1501 # try to python agg
1502 # TODO: shouldn't min_count matter?
-> 1503 result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
1505 return result

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1457, in GroupBy._agg_py_fallback(self, values, ndim, alt)
1452 ser = df.iloc[:, 0]
1454 # We do not get here with UDFs, so we know that our dtype
1455 # should always be preserved by the implemented aggregations
1456 # TODO: Is this exactly right; see WrappedCythonOp get_result_dtype?
-> 1457 res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True)
1459 if isinstance(values, Categorical):
1460 # Because we only get here with known dtype-preserving
1461 # reductions, we cast back to Categorical.
1462 # TODO: if we ever get "rank" working, exclude it here.
1463 res_values = type(values)._from_sequence(res_values, dtype=values.dtype)

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:994, in BaseGrouper.agg_series(self, obj, func, preserve_dtype)
987 if len(obj) > 0 and not isinstance(obj._values, np.ndarray):
988 # we can preserve a little bit more aggressively with EA dtype
989 # because maybe_cast_pointwise_result will do a try/except
990 # with _from_sequence. NB we are assuming here that _from_sequence
991 # is sufficiently strict that it casts appropriately.
992 preserve_dtype = True
--> 994 result = self._aggregate_series_pure_python(obj, func)
996 npvalues = lib.maybe_convert_objects(result, try_float=False)
997 if preserve_dtype:

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\ops.py:1015, in BaseGrouper._aggregate_series_pure_python(self, obj, func)
1012 splitter = self._get_splitter(obj, axis=0)
1014 for i, group in enumerate(splitter):
-> 1015 res = func(group)
1016 res = libreduction.extract_result(res)
1018 if not initialized:
1019 # We only do this validation on the first iteration

File ~\anaconda3\Lib\site-packages\pandas\core\groupby\groupby.py:1857, in GroupBy.mean.<locals>.<lambda>(x)
1853 return self._numba_agg_general(sliding_mean, engine_kwargs)
1854 else:
1855 result = self._cython_agg_general(
1856 "mean",
-> 1857 alt=lambda x: Series(x).mean(numeric_only=numeric_only),
1858 numeric_only=numeric_only,
1859 )
1860 return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11556, in NDFrame._add_numeric_operations.<locals>.mean(self, axis, skipna, numeric_only, **kwargs)
11539 @doc(
11540 _num_doc,
11541 desc="Return the mean of the values over the requested axis.",
(...)
11554 **kwargs,
11555 ):
> 11556 return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11201, in NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
11194 def mean(
11195 self,
11196 axis: Axis | None = 0,
(...)
11199 **kwargs,
11200 ) -> Series | float:
> 11201 return self._stat_function(
11202 "mean", nanops.nanmean, axis, skipna, numeric_only, **kwargs
11203 )

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:11158, in NDFrame._stat_function(self, name, func, axis, skipna, numeric_only, **kwargs)
11154 nv.validate_stat_func((), kwargs, fname=name)
11156 validate_bool_kwarg(skipna, "skipna", none_allowed=False)
> 11158 return self._reduce(
11159 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
11160 )

File ~\anaconda3\Lib\site-packages\pandas\core\series.py:4670, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4665 raise TypeError(
4666 f"Series.{name} does not allow {kwd_name}={numeric_only} "
4667 "with non-numeric dtypes."
4668 )
4669 with np.errstate(all="ignore"):
-> 4670 return op(delegate, skipna=skipna, **kwds)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:96, in disallow.__call__.<locals>._f(*args, **kwargs)
94 try:
95 with np.errstate(invalid="ignore"):
---> 96 return f(*args, **kwargs)
97 except ValueError as e:
98 # we want to transform an object array
99 # ValueError message to the more typical TypeError
100 # e.g. this is normally a disallowed function on
101 # object arrays that contain strings
102 if is_object_dtype(args[0]):

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:158, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
156 result = alt(values, axis=axis, skipna=skipna, **kwds)
157 else:
--> 158 result = alt(values, axis=axis, skipna=skipna, **kwds)
160 return result

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:421, in _datetimelike_compat.<locals>.new_func(values, axis, skipna, mask, **kwargs)
418 if datetimelike and mask is None:
419 mask = isna(values)
--> 421 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
423 if datetimelike:
424 result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:727, in nanmean(values, axis, skipna, mask)
724 dtype_count = dtype
726 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 727 the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
729 if axis is not None and getattr(the_sum, "ndim", False):
730 count = cast(np.ndarray, count)

File ~\anaconda3\Lib\site-packages\pandas\core\nanops.py:1699, in _ensure_numeric(x)
1696 x = complex(x)
1697 except ValueError as err:
1698 # e.g. "foo"
-> 1699 raise TypeError(f"Could not convert {x} to numeric") from err
1700 return x

TypeError: Could not convert 889095?1001051008810016517515315018017017511072100888690707665696070 to numeric
Reply
#2
My guess is mpg.csv contains something that confuses pands.read_csv.

I can get the same error like this:
import pandas as pd

df = pd.read_csv("data.csv")

print(df)
print(df.dtypes)
print(df.groupby("model_year").mean()["mpg"])
Output:
model_year mpg 0 1 l 1 2 1 2 2 2 3 3 1 4 3 2 5 3 3 model_year int64 mpg object dtype: object
Error:
Traceback (most recent call last): File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1871, in _agg_py_fallback res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True) File "venv\lib\site-packages\pandas\core\groupby\ops.py", line 850, in agg_series result = self._aggregate_series_pure_python(obj, func) File "venv\lib\site-packages\pandas\core\groupby\ops.py", line 871, in _aggregate_series_pure_python res = func(group) File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 2377, in <lambda> alt=lambda x: Series(x).mean(numeric_only=numeric_only), File "venv\lib\site-packages\pandas\core\series.py", line 6221, in mean return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs) File "venv\lib\site-packages\pandas\core\generic.py", line 11978, in mean return self._stat_function( File "venv\lib\site-packages\pandas\core\generic.py", line 11935, in _stat_function return self._reduce( File "venv\lib\site-packages\pandas\core\series.py", line 6129, in _reduce return op(delegate, skipna=skipna, **kwds) File "venv\lib\site-packages\pandas\core\nanops.py", line 147, in f result = alt(values, axis=axis, skipna=skipna, **kwds) File "venv\lib\site-packages\pandas\core\nanops.py", line 404, in new_func result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs) File "venv\lib\site-packages\pandas\core\nanops.py", line 720, in nanmean the_sum = _ensure_numeric(the_sum) File "venv\lib\site-packages\pandas\core\nanops.py", line 1693, in _ensure_numeric raise TypeError(f"Could not convert string '{x}' to numeric") TypeError: Could not convert string 'l' to numeric The above exception was the direct cause of the following exception: Traceback (most recent call last): File "test.py", line 7, in <module> print(df.groupby("model_year").mean()["mpg"]) File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 2375, in mean result = self._cython_agg_general( File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1926, in _cython_agg_general new_mgr = data.grouped_reduce(array_func) File "venv\lib\site-packages\pandas\core\internals\managers.py", line 1428, in grouped_reduce applied = sb.apply(func) File "venv\lib\site-packages\pandas\core\internals\blocks.py", line 366, in apply result = func(self.values, **kwargs) File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1923, in array_func result = self._agg_py_fallback(how, values, ndim=data.ndim, alt=alt) File "venv\lib\site-packages\pandas\core\groupby\groupby.py", line 1875, in _agg_py_fallback raise type(err)(msg) from err TypeError: agg function failed [how->mean,dtype->object]
In my example my mpg.csv has a lowercase L instead of a one. You get a slightly different error trace, but that may be due to the mpg.csv file having a slightly different error. Can you post mpg.csv?
Reply
#3
Attaching the file -- hopefully this helps!

Attached Files

.csv   mpg.csv (Size: 17.31 KB / Downloads: 15)
Reply
#4
Instead of this
mpgy = mpg.groupby("model_year").mean()["mpg"]
You need this
mpgy = mpg.groupby("model_year")["mpg"].mean()
The top one computes the mean for all the columns and selects the "mpg" column. That doesn't work when some columns are not numeric such as "name" in your file. The second one computes the mean of only the "mpg" column. You can see this in the example below were mean() is applied to mpg and weight.
import pandas as pd

df = pd.read_csv("mpg.csv")[["model_year", "mpg", "weight"]]
print(df.groupby("model_year").mean())
Output:
mpg weight model_year 70 17.689655 3372.793103 71 21.250000 2995.428571 72 18.714286 3237.714286 73 17.100000 3419.025000 74 22.703704 2877.925926 75 20.266667 3176.800000 76 21.573529 3078.735294 77 23.375000 2997.357143 78 24.061111 2861.805556 79 25.093103 3055.344828 80 33.696552 2436.655172 81 30.334483 2522.931034 82 31.709677 2453.548387
samgardner5 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas - error when running Pycharm, but works on cmd line zxcv101 1 1,376 Jun-18-2022, 01:09 PM
Last Post: snippsat
Brick Have I implemented this function correctly? naggafin 4 2,356 May-22-2022, 02:52 AM
Last Post: stevendaprano
  NotImplementedError: pseudo-class is not implemented - how to Update Python to solve apollo 1 3,107 May-16-2021, 08:03 AM
Last Post: buran
  Function throws error but then works? Milfredo 10 3,801 Sep-12-2020, 05:16 AM
Last Post: Milfredo
  delete a file works but with error Leon79 4 2,949 Jul-14-2020, 06:51 AM
Last Post: snippsat
  Could I override a fully implemented method zatlas1 2 2,418 Jun-06-2019, 02:20 AM
Last Post: zatlas1
  Script works ok on windows but gives error on ubuntu papampi 3 4,055 Oct-11-2017, 04:17 PM
Last Post: papampi
  How to you find the file where a class or a function was implemented? MisterX 4 4,192 Mar-16-2017, 09:51 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020