Python Forum
Why is my train and test accuracy so low?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Why is my train and test accuracy so low?
#1
So I am studying this dataset: https://archive.ics.uci.edu/ml/datasets/...ng+Dataset

My code and dataset can be found here: https://www.4shared.com/rar/zJATZXhxiq/ML_Lab.html

Overview of the dataset:

Bike sharing systems are a new generation of traditional bike rentals where the whole process from membership, rental and return back has become automatic. Through these systems, the user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which are composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real-world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure, and arrival position is explicitly recorded in these systems. This feature turns the bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of the important events in the city could be detected via monitoring these data.

After doing Linear Regression on train and test, I found an accuracy of only 8%... Is this normal? Should I implement another regression algorithm such as Random Forest to increase the accuracy?

Some code snippet after renaming and hot encoding the data:
X = new_df[['workingday_weekend/holiday','holiday_Holiday', 'weekday_Monday','weekday_Tuesday','weekday_Wednesday','weekday_Thirsday','weekday_Friday','weekday_Saturday','weekday_Sunday','season_Fall', 'season_Spring', 'season_Summer','season_Winter','weather_situation_Clear, Few clouds, Partly cloudy, Partly cloudy','weather_situation_Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog','weather_situation_Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds','weather_situation_Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist']] 
y = new_df['count']

X_train, X_test, y_train, y_test = train_test_split(\
                X, y, test_size = 0.2, random_state = 42)

our_regressor = linear_regression(X_train, y_train).fit()
sklearn_regressor = LinearRegression().fit(X_train, y_train)

our_train_accuracy = our_regressor.score()
sklearn_train_accuracy = sklearn_regressor.score(X_train, y_train)

our_test_accuracy = our_regressor.score(X_test, y_test)
sklearn_test_accuracy = sklearn_regressor.score(X_test, y_test)

# Train the model
lm = LinearRegression()
lm.fit(X, y)

# Predict on the test data
X_test = new_df[['workingday_weekend/holiday','holiday_Holiday', 'weekday_Monday','weekday_Tuesday','weekday_Wednesday','weekday_Thirsday','weekday_Friday','weekday_Saturday','weekday_Sunday','season_Fall', 'season_Spring', 'season_Summer','season_Winter','weather_situation_Clear, Few clouds, Partly cloudy, Partly cloudy','weather_situation_Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog','weather_situation_Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds','weather_situation_Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist']] 
y_test = new_df['count']
y_pred = lm.predict(X_test)

# Compute the mean-squared error for our implementation
our_regressor = linear_regression(X_train, y_train).fit()
our_mse = our_regressor.meanAbsoluteError(y_test, y_pred)

# Compute the mean-squared error for Sklearn 
mse = mean_absolute_error(y_test, y_pred)

# Compute the root-mean-squared error for our implementation
our_rms = our_regressor.rootMeanSquaredError(y_test, y_pred)

# Compute the root-mean-square for Sklearn
rms = np.sqrt(mean_squared_error(y_test, y_pred))

# Compute the coefficient of determination for our implementation
our_r2 = our_regressor.r2(y_test, y_pred)

# Compute coefficient of determination for Sklearn
r2 = r2_score(y_test, y_pred)

pd.DataFrame([[our_train_accuracy, sklearn_train_accuracy],
              [our_test_accuracy, sklearn_test_accuracy],
              [our_mse, mse],
              [our_rms, rms],
             [our_r2, r2]],
             ['Train Accuracy', 'Test Accuracy', 'MSE', 'RMS', 'R2'],
             ['Our Implementation', 'Sklearn\'s Implementation'])
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  LSTM Model accuracy caps and I can't improve it celinafregoso99 1 1,985 Dec-19-2020, 01:29 PM
Last Post: jefsummers
  Increasing validation accuracy on a CNN hobbyist 4 4,133 Jun-23-2020, 01:15 PM
Last Post: hussainmujtaba
  Loss and Accuracy Figures. Hani 3 3,048 May-20-2020, 06:55 PM
Last Post: jefsummers
  Best Accuracy From Loop. AhmadMWaddah 4 2,448 Mar-17-2020, 10:25 PM
Last Post: stullis
  Partitioning when splitting data into train and test-dataset Den0st 0 1,972 Dec-07-2019, 08:31 PM
Last Post: Den0st
  Low accuracy for fake news detection model shivani 1 2,341 Oct-10-2019, 12:09 PM
Last Post: animeshagrawal2807
  Need help; iris-train Karin 2 2,666 Apr-12-2019, 02:16 AM
Last Post: Karin
  How to define train set and test set Raj 6 7,884 Mar-08-2018, 01:04 PM
Last Post: Raj

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020