Python Forum

Full Version: DataFrame.astype('category') duplicates column
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi. I have a problem on convesion of object type into category.
My data shape is (1000000, 6)[Date, object,object, object, int64, column_1]
when using the below code, it duplicates last column, the column_1.

df.column_1 = df.column_1.astype('category')

before conversion it is in object type, after conversion it shows category but already duplicated.

one more point. the label of the duplicated column contains whitespace in the end of it.


thanks in advance
(Apr-18-2018, 05:31 AM)garikhgh0 Wrote: [ -> ]before conversion it is in object type, after conversion it shows category but already duplicated.

You can get unique values of the categorical column as follows:

df.column_1 = df.column_1.astype('category')
df.column_1.cat.categories #unique categories
(Apr-18-2018, 05:31 AM)garikhgh0 Wrote: [ -> ]the label of the duplicated column contains whitespace in the end of it.
Didn't understand, but if want to remove duplicates from the original data frame, you can use drop_duplicates method.
e.g.
df = df.drop_duplicates(['column_1'])  # or append .reset_index(drop=True) if needed 
# removes rows with duplicated values in column_1
thanks a lot. I would also mention that, when converting objects itno category, the Dtaframe.pivot_table does not work correctly. creates duplictaes
(Apr-18-2018, 07:33 AM)garikhgh0 Wrote: [ -> ]the Dtaframe.pivot_table does not work correctly

Was trying to reproduce, but couldn't find the error:

import pandas as pd
data = pd.DataFrame({'x': pd.np.random.randint(0,100,1000), 'y': pd.np.random.choice(['a', 'b', 'c'], 1000)})
pd.pivot_table(data, aggfunc=pd.np.sum, values='x', columns=['y'])
Output:
y a b c x 16924 16650 16377
# change column type
data.y = data.y.astype('category') 
pd.pivot_table(data, aggfunc=pd.np.sum, values='x', columns=['y'])
# the result is the same...
Output:
y a b c x 16924 16650 16377