Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
mode()
#1
Hi, I am a newbie to Python.
Am trying to impute a categorical column with mode but failed. Not sure what is the problem :

data.loc[(data.horsepower.isna() == True), 'horsepower']

206 NaN
265 NaN
326 NaN
Name: horsepower, dtype: object

data.loc[(data.horsepower.isna() == True), 'horsepower'] = data.horsepower.mode()

The 3 missing columns rows are not replaced when I rerun the above check (still shows 3 rows with NaN).

But if I replace it wit a hardcode value, it works :
data.loc[(data.horsepower.isna() == True), 'horsepower'] = '9999'

206 9999
265 9999
326 9999
Name: horsepower, dtype: object

Any advice is appreciated. Thank you!
Reply
#2
What is the mode of (NaN, NaN, NaN)?

Series.mode() returns a series. The series can have 1 row, multiple rows, or no rows.
import pandas as pd
from numpy import NaN

print("[NaN, NaN, NaN] mode", pd.DataFrame({"hp": [NaN, NaN, NaN]}).hp.mode(), sep="\n")
print("", "[NaN, 1, NaN] mode", pd.DataFrame({"hp": [NaN, 1, NaN]}).hp.mode(), sep="\n")
print("", "[1, 1, 2, 2] mode", pd.DataFrame({"hp": [1, 1, 2, 2]}).hp.mode(), sep="\n")
Output:
[NaN, NaN, NaN] mode Series([], Name: hp, dtype: float64) [NaN, 1, NaN] mode 0 1.0 Name: hp, dtype: float64 [1, 1, 2, 2] mode 0 1 1 2 Name: hp, dtype: int64
There are multiple problems with how you are using mode(). My guess is you thought mode() would return a numeric value and you would replace NaN with the numeric value. But because the result of mode() is a series, when you copy values from the series into the horsepower column, the values are copied by row. If the first row in horsepower is NaN , the first row in mode is copied. If the horsepower series is longer than the mode series, the mode series is padded with NaN. You can see that here where horsepower has two modes, but there are three NaNs.
import pandas as pd
from numpy import NaN

df = pd.DataFrame({"hp": [NaN, NaN, NaN, 1, 1, 2, 2]})
mode = df.hp.mode()
df.loc[(df.hp.isna() == True), 'hp'] = mode
print(df)
Output:
hp 0 1.0 1 2.0 2 NaN 3 1.0 4 1.0 5 2.0 6 2.0
The first two NaN's were replaced with the two modes, but there was not a third mode to replace the last NaN. You probably also only want to use one mode value.

I would do something like this.
import pandas as pd
from numpy import NaN

df = pd.DataFrame({"hp": [NaN, NaN, NaN]})
mode = df.hp.mode()
fill = mode.values[0] if len(mode) > 0 else 42
df.hp.fillna(fill, inplace=True)
print(df)
Output:
hp 0 42.0 1 42.0 2 42.0
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  things that work in terminal mode but not in sublime mode alok 4 2,897 Aug-11-2021, 07:02 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020