Python Forum

Full Version: Convert dataframe string column to numeric in Python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,
I have taken a sample data as dataframe from an url and then added columns in that. While I try to perform some calculations, I realised that column 'Dage' and 'Cat_Ind' are not numeric but string. So, how to convert them to numeric so as to do next level of analysis?

df2=pd.read_csv("http://users.stat.ufl.edu/~winner/data/agedeath.dat", header = None)
df2.columns =["col1"]
print (df2)
df3 = df2.col1.str.split(expand = True)
df3.columns = ["Cat","Dage", "Cat_Ind"]
print (df3)
Resulting dataframe is:
Output:
Cat Dage Cat_Ind 0 aris 21 1 1 aris 21 2 2 aris 21 3 3 aris 21 4 4 aris 21 5 ... ... ... ... 6181 sovr 95 1436 6182 sovr 95 1437 6183 sovr 97 1438 6184 sovr 100 1439 6185 sovr 101 1440 [6186 rows x 3 columns]
Here is the problem: I use below code -
df3['Dage'].min()
getting output as:
Output:
'100'
I suspect that this must be due to the fact that column 'Dage' is not here as the integer- but string.

Q. How to get this column converted to integer values? Please help
Did you even look at the docs? read_csv takes a parameter dtype that lets you specify the types of the columns: https://pandas.pydata.org/docs/user_guid...-csv-table.