Python Forum
DataFrame operations didn't change orginal
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
DataFrame operations didn't change orginal
#1
hi, i just want to fully understand dataframe operations. i read some documantations. but i have some questions

1)i want to change my dataframe. not to copy one. it is ok in the beginning but in later line somehow it changes automaticly.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn
 
DF=pd.read_excel("merc.xlsx")
DF=DF.sort_values("price",ascending=False).iloc[131:]
price=DF.iloc[:,1:2]
 
from sklearn import preprocessing
ohe = preprocessing.OneHotEncoder()
le=preprocessing.LabelEncoder()
trans=DF.iloc[:,2:3].values
trans[:,0]=le.fit_transform(DF.iloc[:,2])
trans = ohe.fit_transform(trans).toarray()
trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"])
DF=DF.drop("price", axis=1)
DF=DF.drop("transmission",axis=1)
 
DF=pd.concat([DF,trans],axis=1)
 
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split (DF,price, test_size=0.33, random_state=15)
 
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform (x_test)
T
Error:
raceback (most recent call last): File "C:\Users\oby_pc\Desktop\programing\veri bilimi için python ve tensorflow\2_merc.py", line 30, in <module> x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2127, in train_test_split arrays = indexable(*arrays) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 293, in indexable check_consistent_length(*result) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 256, in check_consistent_length raise ValueError("Found input variables with inconsistent numbers of" ValueError: Found input variables with inconsistent numbers of samples: [13116, 12988]
i understand the error but i didnt understand why it happens?

2) when i create new DF with DF2 the error is same Huh Huh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn
 
DF=pd.read_excel("merc.xlsx")
DF2=DF.sort_values("price",ascending=False).iloc[131:]
price=DF2.iloc[:,1:2]
 
from sklearn import preprocessing
ohe = preprocessing.OneHotEncoder()
le=preprocessing.LabelEncoder()
trans=DF2.iloc[:,2:3].values
trans[:,0]=le.fit_transform(DF2.iloc[:,2])
trans = ohe.fit_transform(trans).toarray()
trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"])
DF2=DF2.drop("price", axis=1)
DF2=DF2.drop("transmission",axis=1)
 
DF2=pd.concat([DF2,trans],axis=1)
 
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15)
 
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform (x_test)
bonus question : is there any easier way to use ohe and le than my codes?

tahnk you
buran write Jan-10-2021, 02:19 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#2
It would help to have the whole error message, so we can see where this is happening.

I believe the issue is the way Python handles variable names. a=b does not copy b to a, rather it creates variable a that points to the same value that b does. Changes to that value mean that a and b are both changed. Usually. Unless the operation results in a reassignment that separates the two.

Sounds confusing, would refer you to a Pycon talk by Ned Batchelder PyCon 2015
Reply
#3
Add:
1
2
3
4
a = [1,2,3]
b = a
b.append(4)
print(a)
Output:
[1, 2, 3, 4]
Appended to b but it affected a as well.
Reply
#4
(Jan-10-2021, 12:27 PM)jefsummers Wrote: It would help to have the whole error message, so we can see where this is happening.

I believe the issue is the way Python handles variable names. a=b does not copy b to a, rather it creates variable a that points to the same value that b does. Changes to that value mean that a and b are both changed. Usually. Unless the operation results in a reassignment that separates the two.

Sounds confusing, would refer you to a Pycon talk by Ned Batchelder PyCon 2015

edited orginal massage and add whole error massage

i understand change issue. but in my example it didnt change. i trace step by step DF.shape and after "line 22" it change to its orginal value Angry so the alterations that i make is gone.
Reply
#5
(Jan-10-2021, 12:30 PM)jefsummers Wrote: Add:
1
2
3
4
a = [1,2,3]
b = a
b.append(4)
print(a)
Output:
[1, 2, 3, 4]
Appended to b but it affected a as well.

i understand this. but in my example some how in some where a changes its orginal values like [1,2,3] again
Reply
#6
The difference is 128 records. Would check the length of DF and price right before the error. BTW - error message says line 30 but you don't give us line 30. I am assuming the error is actually occurring in line 24.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Change a numpy array to a dataframe Led_Zeppelin 3 2,750 Jan-26-2023, 09:01 PM
Last Post: deanhystad
Question Sqlite3 how to know when cursor.execute didn't return anything ? SpongeB0B 2 1,643 Dec-18-2022, 06:13 PM
Last Post: deanhystad
  How to change UTC time to local time in Python DataFrame? SamKnight 2 2,730 Jul-28-2022, 08:23 AM
Last Post: Pedroski55
  code running for more than an hour now, yet didn't get any result, what should I do? aiden 2 2,437 Apr-06-2022, 03:41 PM
Last Post: Gribouillis
  Program demonstrates operations of bitwise operators without using bitwise operations ShawnYang 2 2,590 Aug-18-2021, 03:06 PM
Last Post: deanhystad
  Variable scope - "global x" didn't work... ptrivino 5 4,341 Dec-28-2020, 04:52 PM
Last Post: ptrivino
  pandas change value two dataframe nio74maz 4 3,634 Dec-25-2020, 05:25 PM
Last Post: nio74maz
  How to change row 2 to column header within a dataframe sparkt 2 2,896 Aug-20-2020, 05:12 PM
Last Post: sparkt
  Random Choice Operations Souls99 6 4,097 Jul-31-2020, 10:37 PM
Last Post: Souls99
  Two operations in two ranges salwa17 3 3,089 Jun-22-2020, 04:15 PM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020