Python Forum
Converting Dataframe in Python from Object to Float
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Converting Dataframe in Python from Object to Float
#1
Hi there,
i am pretty new to python and i have a dataframe with object types (see image below). Those are supposed to be prices of financial assets, and (e.g.) are in the form of 13,754.34 . At the moment they are object but i need to convert into float otherwise can’t do any operation with it (i mainly need to plot those series and calculate correlation).
I tried astype , i tried to_numeric and so on but none of those work.
Can anybody help me to fix it please? Many thank in advance.

[Image: a6ce1be1585a0ed193aa5220ea5c77d7f9d8c018_2_1380x794.png]
Reply
#2
(Aug-28-2019, 03:59 AM)marco_ita Wrote: I tried astype , i tried to_numeric and so on but none of those work.
Please show us how you tried these and what was the outcome or error.
In addition, did you get these data using pandas.read_csv() or similar function?
If yes, look here there´s a dtype= parameter which is assumingly helpful.
Reply
#3
the error i get with astype
[Image: vvAzpt]

the error i get with to_numeric
[Image: vvAfAG]

click on the broken images above and the screenshots will open up. Thanks.
Reply
#4
You need to get rid of the commas in the strings
df['GER30_Open_float'] = df['GER30_Open'].str.replace(',', '').astype(float)
As already mentioned if using pandas.read_csv() you can use parameter thousands=',' to get rid of the commas
df = pandas.read_csv('example.csv', thousands=',')
Reply
#5
Thank you so much Thomas, really appreciate your help ! :)
Reply
#6
Then please like the post and give reputation if you think i deserve it Wink
Reply
#7
import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import cross_val_score



df = pd.read_csv('ADM-cmiyc_data-small.csv',index_col = 0)
# Spliting the independent variable from dependent variable.
X = df.drop(['target'], axis=1)
y = df['target']

# Splitting the data into four variable to train and test the data.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=12)

# By using these function we are building and fit the model.
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)

# using to predict the class of samples
y_pred = clf.predict(X_test)

# Finally we are using confusion matrix.
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
print('\nClassification_Report:')
print(classification_report(y_test, y_pred))

# Using Cross validation
cross_val_score(clf, X_train,y_train, cv=5)



error is could not convert string to float: '2013-11-21 14:40:57'
Reply
#8
First, please enclose your code in Python tags - that will preserve indentation which is sooo important in figuring out Python code.

You have a column in the dataframe that is a timestamp. You need to either drop that column before feeding it to the classifier or find a way to map that to a numeric value if that info is important to your analysis.

Right after you define X and y try
X.head()
to see what columns you have and what troublesome formats there are.
Reply
#9
Actually i used the get_dummies to convert it but data is too much it hang the system.



site1 time1 site2 time2 site3 time3 site4 time4 site5 time5 ... time6 site7 time7 site8 time8 site9 time9 site10 time10 target
session_id
32756 27320 2014-03-24 16:31:05 27320.0 2014-03-24 16:31:06 12619.0 2014-03-24 16:31:06 29.0 2014-03-24 16:31:06 27320.0 2014-03-24 16:31:07 ... 2014-03-24 16:31:07 12619.0 2014-03-24 16:31:08 2856.0 2014-03-24 16:31:08 2856.0 2014-03-24 16:31:09 12613.0 2014-03-24 16:31:10 Alice
145869 77 2014-04-15 16:49:30 76.0 2014-04-15 16:59:54 879.0 2014-04-15 17:04:10 52.0 2014-04-15 17:05:24 80.0 2014-04-15 17:07:24 ... 2014-04-15 17:13:16 77.0 2014-04-15 17:14:52 NaN NaN NaN NaN NaN NaN Alice
165002 335 2014-03-24 16:48:27 77.0 2014-03-24 16:48:29 1355.0 2014-03-24 16:48:30 77.0 2014-03-24 16:48:30 76.0 2014-03-24 16:48:36 ... 2014-03-24 16:48:36 21.0 2014-03-24 16:48:36 77.0 2014-03-24 16:48:36 21.0 2014-03-24 16:48:37 80.0 2014-03-24 16:48:37 Alice
179723 75 2014-03-24 16:57:07 80.0 2014-03-24 16:57:07 76.0 2014-03-24 16:57:09 82.0 2014-03-24 16:57:21 22.0 2014-03-24 16:57:21 ... 2014-03-24 16:57:21 335.0 2014-03-24 16:57:21 76.0 2014-03-24 16:57:22 2733.0 2014-03-24 16:57:22 80.0 2014-03-24 16:57:22 Alice
39311 21 2014-02-17 16:25:19 21.0 2014-02-17 16:25:28 3132.0 2014-02-17 16:25:31 35.0 2014-02-17 16:25:31 21.0 2014-02-17 16:25:31 ... 2014-02-17 16:25:31 33.0 2014-02-17 16:25:31 27262.0 2014-02-17 16:25:31 19166.0 2014-02-17 16:25:31 29.0 2014-02-17 16:25:31 Alice
5 rows × 21 columns
Reply
#10
So you have a ton of datetime columns. Don't know your application - drop the columns, do one hot encoding, sort the date/times into bins, but you need something that the analyzers can handle, and those text format datetimes are a problem.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  converting dataframe to int numpy array glennford49 1 2,328 Apr-04-2020, 06:15 AM
Last Post: snippsat
  Converting string the pandas dataframe chrismc 0 2,363 Jan-24-2019, 11:07 AM
Last Post: chrismc
  Converting Flattened JSON to Dataframe in Python 2.7 ManMan 1 5,290 Jul-12-2017, 06:52 PM
Last Post: ManMan
  The problem of converting string to float by using CSV file CChen 2 12,953 Jul-11-2017, 03:32 PM
Last Post: CChen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020