Python Forum

Full Version: DataFrame operations didn't change orginal
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
hi, i just want to fully understand dataframe operations. i read some documantations. but i have some questions

1)i want to change my dataframe. not to copy one. it is ok in the beginning but in later line somehow it changes automaticly.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn

DF=pd.read_excel("merc.xlsx")
DF=DF.sort_values("price",ascending=False).iloc[131:]
price=DF.iloc[:,1:2]

from sklearn import preprocessing
ohe = preprocessing.OneHotEncoder()
le=preprocessing.LabelEncoder()
trans=DF.iloc[:,2:3].values
trans[:,0]=le.fit_transform(DF.iloc[:,2])
trans = ohe.fit_transform(trans).toarray()
trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"])
DF=DF.drop("price", axis=1)
DF=DF.drop("transmission",axis=1)

DF=pd.concat([DF,trans],axis=1)

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split (DF,price, test_size=0.33, random_state=15)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform (x_test)
T
Error:
raceback (most recent call last): File "C:\Users\oby_pc\Desktop\programing\veri bilimi için python ve tensorflow\2_merc.py", line 30, in <module> x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2127, in train_test_split arrays = indexable(*arrays) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 293, in indexable check_consistent_length(*result) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 256, in check_consistent_length raise ValueError("Found input variables with inconsistent numbers of" ValueError: Found input variables with inconsistent numbers of samples: [13116, 12988]
i understand the error but i didnt understand why it happens?

2) when i create new DF with DF2 the error is same Huh Huh

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn

DF=pd.read_excel("merc.xlsx")
DF2=DF.sort_values("price",ascending=False).iloc[131:]
price=DF2.iloc[:,1:2]

from sklearn import preprocessing
ohe = preprocessing.OneHotEncoder()
le=preprocessing.LabelEncoder()
trans=DF2.iloc[:,2:3].values
trans[:,0]=le.fit_transform(DF2.iloc[:,2])
trans = ohe.fit_transform(trans).toarray()
trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"])
DF2=DF2.drop("price", axis=1)
DF2=DF2.drop("transmission",axis=1)

DF2=pd.concat([DF2,trans],axis=1)

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform (x_test)
bonus question : is there any easier way to use ohe and le than my codes?

tahnk you
It would help to have the whole error message, so we can see where this is happening.

I believe the issue is the way Python handles variable names. a=b does not copy b to a, rather it creates variable a that points to the same value that b does. Changes to that value mean that a and b are both changed. Usually. Unless the operation results in a reassignment that separates the two.

Sounds confusing, would refer you to a Pycon talk by Ned Batchelder PyCon 2015
Add:
a = [1,2,3]
b = a
b.append(4)
print(a)
Output:
[1, 2, 3, 4]
Appended to b but it affected a as well.
(Jan-10-2021, 12:27 PM)jefsummers Wrote: [ -> ]It would help to have the whole error message, so we can see where this is happening.

I believe the issue is the way Python handles variable names. a=b does not copy b to a, rather it creates variable a that points to the same value that b does. Changes to that value mean that a and b are both changed. Usually. Unless the operation results in a reassignment that separates the two.

Sounds confusing, would refer you to a Pycon talk by Ned Batchelder PyCon 2015

edited orginal massage and add whole error massage

i understand change issue. but in my example it didnt change. i trace step by step DF.shape and after "line 22" it change to its orginal value Angry so the alterations that i make is gone.
(Jan-10-2021, 12:30 PM)jefsummers Wrote: [ -> ]Add:
a = [1,2,3]
b = a
b.append(4)
print(a)
Output:
[1, 2, 3, 4]
Appended to b but it affected a as well.

i understand this. but in my example some how in some where a changes its orginal values like [1,2,3] again
The difference is 128 records. Would check the length of DF and price right before the error. BTW - error message says line 30 but you don't give us line 30. I am assuming the error is actually occurring in line 24.