Why is my train and test accuracy so low?

python420 · (This post was last modified: Dec-09-2019, 01:58 AM by Larz60+.)

So I am studying this dataset: https://archive.ics.uci.edu/ml/datasets/...ng+Dataset

My code and dataset can be found here: https://www.4shared.com/rar/zJATZXhxiq/ML_Lab.html

Overview of the dataset:

Bike sharing systems are a new generation of traditional bike rentals where the whole process from membership, rental and return back has become automatic. Through these systems, the user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which are composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real-world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure, and arrival position is explicitly recorded in these systems. This feature turns the bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of the important events in the city could be detected via monitoring these data.

After doing Linear Regression on train and test, I found an accuracy of only 8%... Is this normal? Should I implement another regression algorithm such as Random Forest to increase the accuracy?

Some code snippet after renaming and hot encoding the data:

X = new_df[['workingday_weekend/holiday','holiday_Holiday', 'weekday_Monday','weekday_Tuesday','weekday_Wednesday','weekday_Thirsday','weekday_Friday','weekday_Saturday','weekday_Sunday','season_Fall', 'season_Spring', 'season_Summer','season_Winter','weather_situation_Clear, Few clouds, Partly cloudy, Partly cloudy','weather_situation_Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog','weather_situation_Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds','weather_situation_Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist']] 
y = new_df['count']

X_train, X_test, y_train, y_test = train_test_split(\
                X, y, test_size = 0.2, random_state = 42)

our_regressor = linear_regression(X_train, y_train).fit()
sklearn_regressor = LinearRegression().fit(X_train, y_train)

our_train_accuracy = our_regressor.score()
sklearn_train_accuracy = sklearn_regressor.score(X_train, y_train)

our_test_accuracy = our_regressor.score(X_test, y_test)
sklearn_test_accuracy = sklearn_regressor.score(X_test, y_test)

# Train the model
lm = LinearRegression()
lm.fit(X, y)

# Predict on the test data
X_test = new_df[['workingday_weekend/holiday','holiday_Holiday', 'weekday_Monday','weekday_Tuesday','weekday_Wednesday','weekday_Thirsday','weekday_Friday','weekday_Saturday','weekday_Sunday','season_Fall', 'season_Spring', 'season_Summer','season_Winter','weather_situation_Clear, Few clouds, Partly cloudy, Partly cloudy','weather_situation_Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog','weather_situation_Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds','weather_situation_Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist']] 
y_test = new_df['count']
y_pred = lm.predict(X_test)

# Compute the mean-squared error for our implementation
our_regressor = linear_regression(X_train, y_train).fit()
our_mse = our_regressor.meanAbsoluteError(y_test, y_pred)

# Compute the mean-squared error for Sklearn 
mse = mean_absolute_error(y_test, y_pred)

# Compute the root-mean-squared error for our implementation
our_rms = our_regressor.rootMeanSquaredError(y_test, y_pred)

# Compute the root-mean-square for Sklearn
rms = np.sqrt(mean_squared_error(y_test, y_pred))

# Compute the coefficient of determination for our implementation
our_r2 = our_regressor.r2(y_test, y_pred)

# Compute coefficient of determination for Sklearn
r2 = r2_score(y_test, y_pred)

pd.DataFrame([[our_train_accuracy, sklearn_train_accuracy],
              [our_test_accuracy, sklearn_test_accuracy],
              [our_mse, mse],
              [our_rms, rms],
             [our_r2, r2]],
             ['Train Accuracy', 'Test Accuracy', 'MSE', 'RMS', 'R2'],
             ['Our Implementation', 'Sklearn\'s Implementation'])

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	LSTM Model accuracy caps and I can't improve it	celinafregoso99	1	1,985	Dec-19-2020, 01:29 PM Last Post: jefsummers
	Increasing validation accuracy on a CNN	hobbyist	4	4,133	Jun-23-2020, 01:15 PM Last Post: hussainmujtaba
	Loss and Accuracy Figures.	Hani	3	3,048	May-20-2020, 06:55 PM Last Post: jefsummers
	Best Accuracy From Loop.	AhmadMWaddah	4	2,448	Mar-17-2020, 10:25 PM Last Post: stullis
	Partitioning when splitting data into train and test-dataset	Den0st	0	1,972	Dec-07-2019, 08:31 PM Last Post: Den0st
	Low accuracy for fake news detection model	shivani	1	2,341	Oct-10-2019, 12:09 PM Last Post: animeshagrawal2807
	Need help; iris-train	Karin	2	2,666	Apr-12-2019, 02:16 AM Last Post: Karin
	How to define train set and test set	Raj	6	7,884	Mar-08-2018, 01:04 PM Last Post: Raj

Why is my train and test accuracy so low?

User Panel Messages

Announcements