Posts: 164
Threads: 88
Joined: Feb 2021
I would like to copy a column from one dataframe and add it to a second dataframe as the last column. I know it can be done, but I am looking for a
simple way to do it. Is there a one, two or three code line way to do it? This is all using Panda and Python, of course.
Any help appreciated.
Thanks in advance.
Respectfully,
LZ
Posts: 6,806
Threads: 20
Joined: Feb 2020
Jul-07-2022, 01:48 PM
(This post was last modified: Jul-07-2022, 01:48 PM by deanhystad.)
dataframe_a['New Column'] = dataframe_b['Existing Column']
Posts: 164
Threads: 88
Joined: Feb 2021
Thanks, I will give it a try.
Respectfully,
LZ
Posts: 164
Threads: 88
Joined: Feb 2021
Jul-07-2022, 04:23 PM
(This post was last modified: Jul-07-2022, 04:30 PM by Led_Zeppelin.)
I tried the simple command that you gave me:
df['machine_status'] = df1['machine_status'] Now df is the new dataframe and df1 is the old copied dataframe. I am trying to move the column named 'machine_status', from df1 to df and put it in the last column. The column is named 'machine_status' in both cases.
I got the following error:
Error: KeyError Traceback (most recent call last)
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\_libs\index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'machine_status'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Input In [25], in <cell line: 1>()
----> 1 df['machine_status'] = df1['machine_status']
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 'machine_status'
Where is the error? I think that I entered the python code correctly. The python interpreter says that I did not.
What is a key error?
Thanks in advance.
Respectfully,
LZ
.
Posts: 6,806
Threads: 20
Joined: Feb 2020
Is "machine_status" a key (column header) in df1?
This is how things work in theory
import pandas as pd
df1 = pd.DataFrame({"Letters": ["A", "B", "C", "D"]})
df2 = pd.DataFrame({"Numbers": [1, 2, 3, 4]})
df1["Integers"] = df2["Numbers"] # df1 can be any name. df2 has to be existing column
print(df1) Output: Letters Integers
0 A 1
1 B 2
2 C 3
3 D 4
Posts: 164
Threads: 88
Joined: Feb 2021
Yes, it is. I took three columns off df dataframe, but before I did that, I made a copy of df dataframe and called it df1.
I am trying to put machine status back on to the slimmed down dataframe, df. I have done all of my scaling and normalizing on df, so now I am trying to put it back together. I want to reverse the dropping of three columns, but only for machine_statis column now.
That is where it fails. df is only a slimmed down column of its former self.
But "machine_statis" is the name of the column I have been discussing.
I hope this is informative.
Respectfully,
LZ
Posts: 164
Threads: 88
Joined: Feb 2021
I like your example. But please understand, doing it that way would be very hard.
My machine_status column has over 220,000 values. It is not practical. I really want to use the short cut method, but I keep getting this "key_error".
That is why I chose the shorthand way.
I will keep trying.
R,
LZ
Posts: 6,806
Threads: 20
Joined: Feb 2020
Jul-07-2022, 06:01 PM
(This post was last modified: Jul-07-2022, 08:04 PM by deanhystad.)
It doesn't make any difference how many "values" there are. The dataframe operations shown are working with series (columns). A column containint 20,000 values works exactly as one containing 4.
This part is just so I have two dataframes to work with.
df1 = pd.DataFrame({"Letters": ["A", "B", "C", "D"]})
df2 = pd.DataFrame({"Numbers": [1, 2, 3, 4]}) If you prefer they could each have 1,000,000 values.
I get an error when I do this:
import pandas as pd
df1 = pd.DataFrame({"Letters": ["A", "B", "C", "D"]})
df2 = pd.DataFrame({"Numbers": [1, 2, 3, 4, 5]})
df1["Integers"] = df2["Number"]
print(df1) Error: Traceback (most recent call last):
File "...\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Number'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ...test.py", line 6, in <module>
df1["Integers"] = df2["Number"]
File "...\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "...\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'Number'
That looks a lot like your error. I think (am quite sure) there is no "machine_status" column in df1.
Posts: 164
Threads: 88
Joined: Feb 2021
You know that is what I thought also. I will check it and see.
Thanks for your input.
Respectfully,
LZ
Posts: 164
Threads: 88
Joined: Feb 2021
You are correct. But it should be there.
In an earlier line in the program, I used the command.
df1=df
Now I did this before I dropped the three columns from df.
It ("machine_status") should be there, but it is not.
How can I work a round this.
I wanted to keep a dataframe from my initial uploading.
The I can work on the initial dataframe to scale and normalize so I
can then "attach" the dropped three columns at a later time.
However, this did not work out and the error is in
ddf1 = df
Any help appreciated.
Respectfully,
LZ
|