Dec-11-2017, 04:09 PM
I was thinking of looking at linear regression for a set of error data. Then, be able to make a prediction what the error count might be in some future week.
These data have two columns:
(1) "ErrorDate" -> week number (of year), and
(2) "ErrorCount" (how many errors did the system have in that week).
I would imagine these data are pretty noisy (random), but who knows?
Anyway, I tried to load this data and do a basic LinearRegression fit test with Panda and scikit-Learn but got an error.
ERROR: "ValueError: Expected 2D array, got 1D array instead:"
--
The code seems so simple, like it should work:
# Read CSV data into dataframe
thedf = pd.read_csv("Errors.csv", sep=",") # Read 2 column data into Pandas DataFrame
X_train, X_test, y_train, y_test = train_test_split(
thedf['ErrorCount'], thedf['ErrorDate'], random_state=0)
print (ussdf.head())
>>>> Prints:
ErrorDate ErrorCount
0 1 80
1 2 118
2 3 249
3 4 397
4 5 159
So far, so good..
But, the shape is apparently wrong and I get the error noted above.
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))
>>>> Prints:
X_test shape: (13,)
y_test shape: (13,)
--
So, I see the shape is the problem, but it's not clear to me from searches I did how to change it. This is probably a super simple question. I have a Pandas book on order but it won't arrive for another week.
Suggestions?
Thanks very much in advance,
These data have two columns:
(1) "ErrorDate" -> week number (of year), and
(2) "ErrorCount" (how many errors did the system have in that week).
I would imagine these data are pretty noisy (random), but who knows?
Anyway, I tried to load this data and do a basic LinearRegression fit test with Panda and scikit-Learn but got an error.
ERROR: "ValueError: Expected 2D array, got 1D array instead:"
--
The code seems so simple, like it should work:
# Read CSV data into dataframe
thedf = pd.read_csv("Errors.csv", sep=",") # Read 2 column data into Pandas DataFrame
X_train, X_test, y_train, y_test = train_test_split(
thedf['ErrorCount'], thedf['ErrorDate'], random_state=0)
print (ussdf.head())
>>>> Prints:
ErrorDate ErrorCount
0 1 80
1 2 118
2 3 249
3 4 397
4 5 159
So far, so good..
But, the shape is apparently wrong and I get the error noted above.
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))
>>>> Prints:
X_test shape: (13,)
y_test shape: (13,)
--
So, I see the shape is the problem, but it's not clear to me from searches I did how to change it. This is probably a super simple question. I have a Pandas book on order but it won't arrive for another week.
Suggestions?
Thanks very much in advance,