Python Forum
can't get rid of '?' within my df!
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
can't get rid of '?' within my df!
#1
Just gotten back on the laptop after an extended break and can't seem to complete this simple task! I've got a large dataset, and just going about my preprocessing and I can't seem to mark and drop the rows with a '?'. I've tried repeatedly to replace said '?'s with NaN so I can drop them willy nilly, though nothing seems to be affecting the dataset whatsoever.

Most seem to drop the rows if a value occurs in a particular column, though I don't want to go through each column, rather just the entire dataset at once. Also because my rows have different types then that perhaps is causing some friction: all mixed between float and object.


Here's what I've tried:

train = pre_train.replace('?', 'np.Nan')

train = pre_train.replace({'?': np.nan}).dropna()

train = pre_train.replace({to_replace = "?", value = "NaN"})

train = pre_train.where(pre_train != '?', other = 'NaN')
And I can't seem to get any to work, so any help is appreciated. Will offer a little segment of what the dataset looks like (note there are more columns). If I do the opposite and attempt to rid my df of all rows that contain an element that is not '?', and I manage to clear the df, so really confused by this!

[Image: r2TD02D]

I can't seem to work out how to edit my original post, so if a mod could join these two together I would be eternally grateful!
Reply
#2
It would be better, if I had original data, or you provide minimal reproducible example.
It seems everything works fine for me, look at the following example:

import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3], 'y':['one', '?', 'two']})
df.loc[df.y.str.contains('?', regex=False), 'y'] = pd.np.nan
Reply
#3
I get a type error with this. I did upload an image though guess it didn't work.

Find example of data here:

Refer to index 27 to see '?'

I would to to search the entire df rather than search a single column. Or would it be necessary to iterate through each column? Though that seems kinda un-python.
Reply
#4
OK previously I had uploaded the file wrong, contained spaces everywhere.

Here is it new:
Refer to [16] for '?'. Now when I use your code it simply either removes all content, or it replaces the entire dataframe with 'nan'.
Reply
#5
Did you try something like this?

df.loc[df.loc['native-country'].str.contains('?', regex=False), 'native-country'] = pd.np.nan
Reply
#6
(Aug-16-2019, 01:12 AM)scidam Wrote: Did you try something like this?

df.loc[df.loc['native-country'].str.contains('?', regex=False), 'native-country'] = pd.np.nan

It worked wonderfully, thank you
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020