Python Forum

Full Version: [pandas] Convert categorical data to numbers
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I have a data frame df_train which has a column sub_division.

The values in the column is look like below

ABC_commercial
ABC_Private
Test ROM DIV
ROM DIV
TEST SEC ROM

I am trying to
1. convert anything starts with ABC* to a number (for ex: 1)
2. convert anything contains ROM to a number (for ex: 2)

Can you suggest please?

Thanks in advance.
A possibility that might be useful for you:
import pandas as pd

s = pd.Series(['ABC_commercial', 'ABC_Private', 'Test ROM DIV', 'ROM DIV', 'TEST SEC ROM'], dtype="object")
df = pd.DataFrame(s, columns=['sub_division'])

df['ABC'] = (df.sub_division.str.find('ABC_') > -1) * 1
df['ROM'] = (df.sub_division.str.find('ROM') > -1) * 1

print(df)
Output:
     sub_division  ABC  ROM
0  ABC_commercial    1    0
1     ABC_Private    1    0
2    Test ROM DIV    0    1
3         ROM DIV    0    1
4    TEST SEC ROM    0    1