Python Forum
multi index issue of one hot encoder preprocessing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
multi index issue of one hot encoder preprocessing
#1
Hello, I am a newbie on feature engineering. Belows are the sample pandas DataFrame and the results

df = pd.DataFrame({'city':['seoul', 'seoul', 'seoul', 'pusan', 'pusan', 'pusan', 'gwangju', 'gwangju', 'gwangju'] , 
                   'date':['2022-01-01', '2022-02-01', '2022-03-01','2022-01-01', '2022-02-01', '2022-03-01','2022-01-01', '2022-02-01', '2022-03-01'],
                   'price': [7, 9, 4, 7, 5, 2, 1, 8, 6],
                   'quantity':[89, 53, 75, 33, 96, 72, 59, 25, 82],
                   'sales_amount': [3.4, 6.1, 9.2, 7.2, 2.9, 8.1, 5.9, 4.4, 7.9]})

df.set_index(['city', 'date'], inplace=True)
print(df)
Output:
price quantity sales_amount city date seoul 2022-01-01 7 89 3.4 2022-02-01 9 53 6.1 2022-03-01 4 75 9.2 pusan 2022-01-01 7 33 7.2 2022-02-01 5 96 2.9 2022-03-01 2 72 8.1 gwangju 2022-01-01 1 59 5.9 2022-02-01 8 25 4.4 2022-03-01 6 82 7.9
The attributes of feature are ['price', 'quantity'] and the the class is 'sales_amount'. I try to predict the 'sales_amount' with linear regression alogoritm. The training data are extracted from the dataFrame

X = df[: , 0:2]
y = df[:, 2]
But, my issue is how to handle the multi indexes, ['city', 'date'] of feature matrix. Do I have to transform these indexes with 'sklearn.preprocessing.OneHotEncoder'?

from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder(sparse=False)
ohe.fit(df.index.values.reshape(-1, 1))
encorded_index = ohe.transform(df.index.values.reshape(-1, 1)).toarray()
But it throws the error

Error:
TypeError: unhashable type: 'numpy.ndarray'
In most of references of feature engineering, the index type is simple integer, so I have no idea how to handle the multi-indexes of
feature DataFrame.

And one more question!
As you know, the index of DataFrame contains the 'date' columns, then is it possible to use 'train_test_split' of 'sklearn.model_selection' module when linear regression model is fitting the training data?

Best regards
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 607 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  [split] Getting Index Error - list index out of range krishna 2 2,568 Jan-09-2021, 08:29 AM
Last Post: buran
  Getting Index Error - list index out of range RahulSingh 2 6,102 Feb-03-2020, 07:17 AM
Last Post: RahulSingh
  How to perform preprocessing for hyperspectral image vokoyo 0 1,910 Oct-05-2019, 02:11 PM
Last Post: vokoyo
  Applying operation to a pandas multi index dataframe subgroup Nuovoq 1 2,620 Sep-04-2019, 10:04 PM
Last Post: Nuovoq
  preprocessing problem hadith 1 1,886 Jul-20-2019, 12:47 PM
Last Post: Larz60+
  DataFrame index issue Astrikor 2 3,001 Aug-25-2018, 04:25 PM
Last Post: Astrikor
  Select in Multi Index Pandas diego_last 0 2,323 Aug-01-2018, 12:56 PM
Last Post: diego_last

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020