Python Forum
DataFrame operations didn't change orginal
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
DataFrame operations didn't change orginal
#1
hi, i just want to fully understand dataframe operations. i read some documantations. but i have some questions

1)i want to change my dataframe. not to copy one. it is ok in the beginning but in later line somehow it changes automaticly.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn

DF=pd.read_excel("merc.xlsx")
DF=DF.sort_values("price",ascending=False).iloc[131:]
price=DF.iloc[:,1:2]

from sklearn import preprocessing
ohe = preprocessing.OneHotEncoder()
le=preprocessing.LabelEncoder()
trans=DF.iloc[:,2:3].values
trans[:,0]=le.fit_transform(DF.iloc[:,2])
trans = ohe.fit_transform(trans).toarray()
trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"])
DF=DF.drop("price", axis=1)
DF=DF.drop("transmission",axis=1)

DF=pd.concat([DF,trans],axis=1)

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split (DF,price, test_size=0.33, random_state=15)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform (x_test)
T
Error:
raceback (most recent call last): File "C:\Users\oby_pc\Desktop\programing\veri bilimi için python ve tensorflow\2_merc.py", line 30, in <module> x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2127, in train_test_split arrays = indexable(*arrays) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 293, in indexable check_consistent_length(*result) File "C:\Users\oby_pc\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 256, in check_consistent_length raise ValueError("Found input variables with inconsistent numbers of" ValueError: Found input variables with inconsistent numbers of samples: [13116, 12988]
i understand the error but i didnt understand why it happens?

2) when i create new DF with DF2 the error is same Huh Huh

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sbn

DF=pd.read_excel("merc.xlsx")
DF2=DF.sort_values("price",ascending=False).iloc[131:]
price=DF2.iloc[:,1:2]

from sklearn import preprocessing
ohe = preprocessing.OneHotEncoder()
le=preprocessing.LabelEncoder()
trans=DF2.iloc[:,2:3].values
trans[:,0]=le.fit_transform(DF2.iloc[:,2])
trans = ohe.fit_transform(trans).toarray()
trans=pd.DataFrame(data=trans, index=range(len(trans)), columns=["Auto","man","other","semi"])
DF2=DF2.drop("price", axis=1)
DF2=DF2.drop("transmission",axis=1)

DF2=pd.concat([DF2,trans],axis=1)

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split (DF2,price, test_size=0.33, random_state=15)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform (x_test)
bonus question : is there any easier way to use ohe and le than my codes?

tahnk you
buran write Jan-10-2021, 02:19 PM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#2
It would help to have the whole error message, so we can see where this is happening.

I believe the issue is the way Python handles variable names. a=b does not copy b to a, rather it creates variable a that points to the same value that b does. Changes to that value mean that a and b are both changed. Usually. Unless the operation results in a reassignment that separates the two.

Sounds confusing, would refer you to a Pycon talk by Ned Batchelder PyCon 2015
Reply
#3
Add:
a = [1,2,3]
b = a
b.append(4)
print(a)
Output:
[1, 2, 3, 4]
Appended to b but it affected a as well.
Reply
#4
(Jan-10-2021, 12:27 PM)jefsummers Wrote: It would help to have the whole error message, so we can see where this is happening.

I believe the issue is the way Python handles variable names. a=b does not copy b to a, rather it creates variable a that points to the same value that b does. Changes to that value mean that a and b are both changed. Usually. Unless the operation results in a reassignment that separates the two.

Sounds confusing, would refer you to a Pycon talk by Ned Batchelder PyCon 2015

edited orginal massage and add whole error massage

i understand change issue. but in my example it didnt change. i trace step by step DF.shape and after "line 22" it change to its orginal value Angry so the alterations that i make is gone.
Reply
#5
(Jan-10-2021, 12:30 PM)jefsummers Wrote: Add:
a = [1,2,3]
b = a
b.append(4)
print(a)
Output:
[1, 2, 3, 4]
Appended to b but it affected a as well.

i understand this. but in my example some how in some where a changes its orginal values like [1,2,3] again
Reply
#6
The difference is 128 records. Would check the length of DF and price right before the error. BTW - error message says line 30 but you don't give us line 30. I am assuming the error is actually occurring in line 24.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Change a numpy array to a dataframe Led_Zeppelin 3 1,065 Jan-26-2023, 09:01 PM
Last Post: deanhystad
Question Sqlite3 how to know when cursor.execute didn't return anything ? SpongeB0B 2 810 Dec-18-2022, 06:13 PM
Last Post: deanhystad
  How to change UTC time to local time in Python DataFrame? SamKnight 2 1,527 Jul-28-2022, 08:23 AM
Last Post: Pedroski55
  Replicate Excel operations with Python Lumberjack 3 1,772 May-10-2022, 01:44 AM
Last Post: Lumberjack
  code running for more than an hour now, yet didn't get any result, what should I do? aiden 2 1,421 Apr-06-2022, 03:41 PM
Last Post: Gribouillis
  Program demonstrates operations of bitwise operators without using bitwise operations ShawnYang 2 1,757 Aug-18-2021, 03:06 PM
Last Post: deanhystad
  Variable scope - "global x" didn't work... ptrivino 5 2,979 Dec-28-2020, 04:52 PM
Last Post: ptrivino
  pandas change value two dataframe nio74maz 4 2,673 Dec-25-2020, 05:25 PM
Last Post: nio74maz
  How to change row 2 to column header within a dataframe sparkt 2 2,110 Aug-20-2020, 05:12 PM
Last Post: sparkt
  Random Choice Operations Souls99 6 2,889 Jul-31-2020, 10:37 PM
Last Post: Souls99

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020