Python Forum

Full Version: can't get rid of '?' within my df!
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Just gotten back on the laptop after an extended break and can't seem to complete this simple task! I've got a large dataset, and just going about my preprocessing and I can't seem to mark and drop the rows with a '?'. I've tried repeatedly to replace said '?'s with NaN so I can drop them willy nilly, though nothing seems to be affecting the dataset whatsoever.

Most seem to drop the rows if a value occurs in a particular column, though I don't want to go through each column, rather just the entire dataset at once. Also because my rows have different types then that perhaps is causing some friction: all mixed between float and object.


Here's what I've tried:

train = pre_train.replace('?', 'np.Nan')

train = pre_train.replace({'?': np.nan}).dropna()

train = pre_train.replace({to_replace = "?", value = "NaN"})

train = pre_train.where(pre_train != '?', other = 'NaN')
And I can't seem to get any to work, so any help is appreciated. Will offer a little segment of what the dataset looks like (note there are more columns). If I do the opposite and attempt to rid my df of all rows that contain an element that is not '?', and I manage to clear the df, so really confused by this!

[Image: r2TD02D]

I can't seem to work out how to edit my original post, so if a mod could join these two together I would be eternally grateful!
It would be better, if I had original data, or you provide minimal reproducible example.
It seems everything works fine for me, look at the following example:

import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3], 'y':['one', '?', 'two']})
df.loc[df.y.str.contains('?', regex=False), 'y'] = pd.np.nan
I get a type error with this. I did upload an image though guess it didn't work.

Find example of data here:

Refer to index 27 to see '?'

I would to to search the entire df rather than search a single column. Or would it be necessary to iterate through each column? Though that seems kinda un-python.
OK previously I had uploaded the file wrong, contained spaces everywhere.

Here is it new:
Refer to [16] for '?'. Now when I use your code it simply either removes all content, or it replaces the entire dataframe with 'nan'.
Did you try something like this?

df.loc[df.loc['native-country'].str.contains('?', regex=False), 'native-country'] = pd.np.nan
(Aug-16-2019, 01:12 AM)scidam Wrote: [ -> ]Did you try something like this?

df.loc[df.loc['native-country'].str.contains('?', regex=False), 'native-country'] = pd.np.nan

It worked wonderfully, thank you