Python Forum
ValueError: could not convert string to float: '4 AVENUE'
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ValueError: could not convert string to float: '4 AVENUE'
#1
I tried to run regression using
regr = linear_model.LinearRegression()
regr.fit(X, y)
My data contains columns with DateTime format and another with physical address, such as '8300 4 AVENUE
1'. When I ran the code, I received the following error:
ValueError                                Traceback (most recent call last)
<ipython-input-119-8a11d5d4a70e> in <module>
      1 regr = linear_model.LinearRegression()
----> 2 regr.fit(X, y)

~\New\Anaconda3\lib\site-packages\sklearn\linear_model\base.py in fit(self, X, y, sample_weight)
    456         n_jobs_ = self.n_jobs
    457         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 458                          y_numeric=True, multi_output=True)
    459 
    460         if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:

~\New\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    754                     ensure_min_features=ensure_min_features,
    755                     warn_on_dtype=warn_on_dtype,
--> 756                     estimator=estimator)
    757     if multi_output:
    758         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\New\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    565         # make sure we actually converted to numeric:
    566         if dtype_numeric and array.dtype.kind == "O":
--> 567             array = array.astype(np.float64)
    568         if not allow_nd and array.ndim >= 3:
    569             raise ValueError("Found array with dim %d. %s expected <= 2."

ValueError: could not convert string to float: '4 AVENUE'
I decided to drop the datetime column at this stage but I need the address column for my analysis.
Please, do help me.
Thank you in advance

I also tried to convert the address column to float but it converted the whole column to NAN, rendering the whole process useless
Reply
#2
In general, linear regression expects numbers. So, you need to perform some feature engineering first. E.g. you can convert addresses to coordinates (if this make sense with the problem you're trying to solve): lat and lon; Also, you can build separate regression models for each address you have. You can handle dates as shown here.
Reply
#3
Thank you, scidam. I've been able to deal with the timestamp problem. My challenge is, I'm trying to predict a string variable and so expect the system to convert it to float.
Actually, I'm using this (in)famous NYC-311 Service complaints data and need to predict the number of future complaints (for a particular complaint type I'd identified earlier). Everything goes well but, even when I tried to convert the string-formatted complaint type dependent variable, it gives me this same error message:
ValueError: could not convert string to float: 'HEATING'
.
Please, is there any other way to treat this variable?
Below is a summary of my work so far:
#Fetching the data
source = 'https://data.cityofnewyork.us/resource/erm2-nwe9.csv?$limit=10000000&Agency=HPD&$select=created_date,unique_key,complaint_type,Descriptor,incident_zip,incident_address,street_name,address_type,city,resolution_description,borough,latitude,longitude,closed_date,location_type,status'
if os.path.isfile('./assets/csr/erm2-nwe9.csv') == True:
    my_data = pd.read_csv('./assets/csr/erm2-nwe9.csv', sep=',', parse_dates=['created_date', 'closed_date'], low_memory=False, index_col = [0])
else:
    my_data = pd.read_csv(source, sep=',', parse_dates=['created_date', 'closed_date'], low_memory=False, index_col = [0])
    my_data.to_csv('./assets/csr/erm2-nwe9.csv', index_col = [0])
#Identifying the commonest complaint type: my_data1 = my_data.loc[my_data['complaint_type']=='HEATING'].dropna()
 
# Dealing with DateTime:
my_data1['created_date'] = pd.to_datetime(my_data1['created_date'],errors="coerce")
my_data1['Hour'] = my_data1["created_date"].dt.strftime('%H')    
my_data1['Day'] = my_data1["created_date"].dt.strftime('%d')    
my_data1['Month'] = my_data1["created_date"].dt.strftime('%m')    
my_data1['Year'] = my_data1["created_date"].dt.strftime('%Y')  
# Dropping unnecessary columns
my_data_1 = my_data1.drop(['unique_key', 'created_date', 'incident_address', 'street_name', 'address_type', 
        'city', 'resolution_description', 'location_type', 'borough', 'closed_date', 'status'], axis = 1)
my_data_1 = my_data_1.dropna()
#Splitting the data
X=my_data_1.loc[:,my_data_1.columns != "complaint_type"]
y=my_data_1["complaint_type"]
X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=0)
#standadizing the data
sc = StandardScaler()  
X_trainset = sc.fit_transform(X_trainset)  
X_testset = sc.transform(X_testset)
# Running the regression model
regr = linear_model.LinearRegression()
regr.fit(X, y)
And here's where the problems begin.
Please, do help.
Thank you
Reply
#4
Consider one-hot encoding. See if this helps:
One hot encoding a feature in a dataframe
Reply
#5
(Jan-26-2020, 12:34 PM)jefsummers Wrote: Consider one-hot encoding. See if this helps: One hot encoding a feature in a dataframe
Thank you, jefsummers
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad ValueError: could not convert string to float badju 0 4,294 Jul-01-2021, 12:13 AM
Last Post: badju
  Indirectlty convert string to float in JSON file WBPYTHON 6 5,830 May-06-2020, 12:09 PM
Last Post: WBPYTHON
  ValueError: could not convert string to float RahulSingh 3 4,116 Apr-09-2020, 02:59 PM
Last Post: dinesh
  convert a list of string+bytes into a list of strings (python 3) pacscaloupsu 4 10,741 Mar-17-2020, 07:21 AM
Last Post: markfilan
  Convert dataframe string column to numeric in Python darpInd 1 2,270 Mar-14-2020, 10:07 AM
Last Post: ndc85430
  convert 'A B C' to numpy float matrix rezabma 4 2,491 Feb-27-2020, 09:48 AM
Last Post: rezabma
  Convert 'object' to 'string' AdWill97 1 62,346 May-06-2019, 08:22 AM
Last Post: Yoriz
  ValueError: could not convert the string to float Grin 3 10,183 Jun-14-2018, 08:17 PM
Last Post: killerrex
  Problema with convert image to string karlo123 1 2,746 May-16-2018, 10:44 PM
Last Post: karlo123
  Error: ValueError: could not convert string to float: 'L200 1.6 D/C' Jaarroy 2 6,518 Jan-18-2018, 02:00 PM
Last Post: Jaarroy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020