Hello, I am a newbie on feature engineering. Belows are the sample pandas DataFrame and the results
feature DataFrame.
And one more question!
As you know, the index of DataFrame contains the 'date' columns, then is it possible to use 'train_test_split' of 'sklearn.model_selection' module when linear regression model is fitting the training data?
Best regards
df = pd.DataFrame({'city':['seoul', 'seoul', 'seoul', 'pusan', 'pusan', 'pusan', 'gwangju', 'gwangju', 'gwangju'] , 'date':['2022-01-01', '2022-02-01', '2022-03-01','2022-01-01', '2022-02-01', '2022-03-01','2022-01-01', '2022-02-01', '2022-03-01'], 'price': [7, 9, 4, 7, 5, 2, 1, 8, 6], 'quantity':[89, 53, 75, 33, 96, 72, 59, 25, 82], 'sales_amount': [3.4, 6.1, 9.2, 7.2, 2.9, 8.1, 5.9, 4.4, 7.9]}) df.set_index(['city', 'date'], inplace=True) print(df)
Output: price quantity sales_amount
city date
seoul 2022-01-01 7 89 3.4
2022-02-01 9 53 6.1
2022-03-01 4 75 9.2
pusan 2022-01-01 7 33 7.2
2022-02-01 5 96 2.9
2022-03-01 2 72 8.1
gwangju 2022-01-01 1 59 5.9
2022-02-01 8 25 4.4
2022-03-01 6 82 7.9
The attributes of feature are ['price', 'quantity'] and the the class is 'sales_amount'. I try to predict the 'sales_amount' with linear regression alogoritm. The training data are extracted from the dataFrameX = df[: , 0:2] y = df[:, 2]But, my issue is how to handle the multi indexes, ['city', 'date'] of feature matrix. Do I have to transform these indexes with 'sklearn.preprocessing.OneHotEncoder'?
from sklearn.preprocessing import OneHotEncoder ohe = OneHotEncoder(sparse=False) ohe.fit(df.index.values.reshape(-1, 1)) encorded_index = ohe.transform(df.index.values.reshape(-1, 1)).toarray()But it throws the error
Error:TypeError: unhashable type: 'numpy.ndarray'
In most of references of feature engineering, the index type is simple integer, so I have no idea how to handle the multi-indexes offeature DataFrame.
And one more question!
As you know, the index of DataFrame contains the 'date' columns, then is it possible to use 'train_test_split' of 'sklearn.model_selection' module when linear regression model is fitting the training data?
Best regards