hi, i just want to fully understand dataframe operations. i read some documantations. but i have some questions
1)i want to change my dataframe. not to copy one. it is ok in the beginning but in later line somehow it changes automaticly.
2) when i create new DF with DF2 the error is same
tahnk you
1)i want to change my dataframe. not to copy one. it is ok in the beginning but in later line somehow it changes automaticly.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sbn DF=pd.read_excel("merc.xlsx") DF=DF.sort_values("price",ascending=False).iloc[131:] price=DF.iloc[:,1:2] from sklearn import preprocessing ohe = preprocessing.OneHotEncoder() le=preprocessing.LabelEncoder() trans=DF.iloc[:,2:3].values trans[:,0]=le.fit_transform(DF.iloc[:,2]) trans = ohe.fit_transform(trans).toarray() trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"]) DF=DF.drop("price", axis=1) DF=DF.drop("transmission",axis=1) DF=pd.concat([DF,trans],axis=1) from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test=train_test_split (DF,price, test_size=0.33, random_state=15) from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaler.fit(x_train) x_train = scaler.transform(x_train) x_test = scaler.transform (x_test)T
Error:raceback (most recent call last):
File "C:\Users\oby_pc\Desktop\programing\veri bilimi için python ve tensorflow\2_merc.py", line 30, in <module>
x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15)
File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2127, in train_test_split
arrays = indexable(*arrays)
File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 293, in indexable
check_consistent_length(*result)
File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 256, in check_consistent_length
raise ValueError("Found input variables with inconsistent numbers of"
ValueError: Found input variables with inconsistent numbers of samples: [13116, 12988]
i understand the error but i didnt understand why it happens?2) when i create new DF with DF2 the error is same


import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sbn DF=pd.read_excel("merc.xlsx") DF2=DF.sort_values("price",ascending=False).iloc[131:] price=DF2.iloc[:,1:2] from sklearn import preprocessing ohe = preprocessing.OneHotEncoder() le=preprocessing.LabelEncoder() trans=DF2.iloc[:,2:3].values trans[:,0]=le.fit_transform(DF2.iloc[:,2]) trans = ohe.fit_transform(trans).toarray() trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"]) DF2=DF2.drop("price", axis=1) DF2=DF2.drop("transmission",axis=1) DF2=pd.concat([DF2,trans],axis=1) from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15) from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaler.fit(x_train) x_train = scaler.transform(x_train) x_test = scaler.transform (x_test)bonus question : is there any easier way to use ohe and le than my codes?
tahnk you
buran write Jan-10-2021, 02:19 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.