Python Forum
Copy a column from one dataframe to another dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Copy a column from one dataframe to another dataframe
#1
I would like to copy a column from one dataframe and add it to a second dataframe as the last column. I know it can be done, but I am looking for a
simple way to do it. Is there a one, two or three code line way to do it? This is all using Panda and Python, of course.

Any help appreciated.

Thanks in advance.

Respectfully,

LZ
Reply
#2
dataframe_a['New Column'] = dataframe_b['Existing Column']
Reply
#3
Thanks, I will give it a try.

Respectfully,

LZ
Reply
#4
I tried the simple command that you gave me:

df['machine_status'] = df1['machine_status']
Now df is the new dataframe and df1 is the old copied dataframe. I am trying to move the column named 'machine_status', from df1 to df and put it in the last column. The column is named 'machine_status' in both cases.

I got the following error:

Error:
KeyError Traceback (most recent call last) File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexes\base.py:3621, in Index.get_loc(self, key, method, tolerance) 3620 try: -> 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err: File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\_libs\index.pyx:136, in pandas._libs.index.IndexEngine.get_loc() File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\_libs\index.pyx:163, in pandas._libs.index.IndexEngine.get_loc() File pandas\_libs\hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas\_libs\hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'machine_status' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Input In [25], in <cell line: 1>() ----> 1 df['machine_status'] = df1['machine_status'] File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\frame.py:3505, in DataFrame.__getitem__(self, key) 3503 if self.columns.nlevels > 1: 3504 return self._getitem_multilevel(key) -> 3505 indexer = self.columns.get_loc(key) 3506 if is_integer(indexer): 3507 indexer = [indexer] File ~\miniconda3\envs\pump-failure-pred\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance) 3621 return self._engine.get_loc(casted_key) 3622 except KeyError as err: -> 3623 raise KeyError(key) from err 3624 except TypeError: 3625 # If we have a listlike key, _check_indexing_error will raise 3626 # InvalidIndexError. Otherwise we fall through and re-raise 3627 # the TypeError. 3628 self._check_indexing_error(key) KeyError: 'machine_status'
Where is the error? I think that I entered the python code correctly. The python interpreter says that I did not.

What is a key error?

Thanks in advance.

Respectfully,

LZ
.
Reply
#5
Is "machine_status" a key (column header) in df1?

This is how things work in theory
import pandas as pd

df1 = pd.DataFrame({"Letters": ["A", "B", "C", "D"]})
df2 = pd.DataFrame({"Numbers": [1, 2, 3, 4]})

df1["Integers"] = df2["Numbers"]  # df1 can be any name.  df2 has to be existing column
print(df1)
Output:
Letters Integers 0 A 1 1 B 2 2 C 3 3 D 4
Reply
#6
Yes, it is. I took three columns off df dataframe, but before I did that, I made a copy of df dataframe and called it df1.

I am trying to put machine status back on to the slimmed down dataframe, df. I have done all of my scaling and normalizing on df, so now I am trying to put it back together. I want to reverse the dropping of three columns, but only for machine_statis column now.

That is where it fails. df is only a slimmed down column of its former self.

But "machine_statis" is the name of the column I have been discussing.

I hope this is informative.

Respectfully,

LZ
Reply
#7
I like your example. But please understand, doing it that way would be very hard.

My machine_status column has over 220,000 values. It is not practical. I really want to use the short cut method, but I keep getting this "key_error".

That is why I chose the shorthand way.
I will keep trying.

R,

LZ
Reply
#8
It doesn't make any difference how many "values" there are. The dataframe operations shown are working with series (columns). A column containint 20,000 values works exactly as one containing 4.

This part is just so I have two dataframes to work with.
df1 = pd.DataFrame({"Letters": ["A", "B", "C", "D"]})
df2 = pd.DataFrame({"Numbers": [1, 2, 3, 4]})
If you prefer they could each have 1,000,000 values.

I get an error when I do this:
import pandas as pd

df1 = pd.DataFrame({"Letters": ["A", "B", "C", "D"]})
df2 = pd.DataFrame({"Numbers": [1, 2, 3, 4, 5]})

df1["Integers"] = df2["Number"]
print(df1)
Error:
Traceback (most recent call last): File "...\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Number' The above exception was the direct cause of the following exception: Traceback (most recent call last): File ...test.py", line 6, in <module> df1["Integers"] = df2["Number"] File "...\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__ indexer = self.columns.get_loc(key) File "...\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc raise KeyError(key) from err KeyError: 'Number'
That looks a lot like your error. I think (am quite sure) there is no "machine_status" column in df1.
Reply
#9
You know that is what I thought also. I will check it and see.

Thanks for your input.

Respectfully,

LZ
Reply
#10
You are correct. But it should be there.

In an earlier line in the program, I used the command.

df1=df

Now I did this before I dropped the three columns from df.

It ("machine_status") should be there, but it is not.

How can I work a round this.

I wanted to keep a dataframe from my initial uploading.
The I can work on the initial dataframe to scale and normalize so I
can then "attach" the dropped three columns at a later time.

However, this did not work out and the error is in

ddf1 = df

Any help appreciated.

Respectfully,

LZ
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries sawtooth500 14 369 Apr-24-2024, 01:42 AM
Last Post: sawtooth500
  Why is the copy method name in python list copy and not `__copy__`? YouHoGeon 2 282 Apr-04-2024, 01:18 AM
Last Post: YouHoGeon
  Elegant way to apply each element of an array to a dataframe? sawtooth500 7 424 Mar-29-2024, 05:51 PM
Last Post: deanhystad
  Dataframe copy warning sawtooth500 4 359 Mar-25-2024, 11:38 PM
Last Post: sawtooth500
  FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries i sawtooth500 3 1,187 Mar-22-2024, 03:08 AM
Last Post: deanhystad
  Adding PD DataFrame column bsben 2 323 Mar-08-2024, 10:46 PM
Last Post: deanhystad
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 743 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  Comparing Dataframe to String? RockBlok 2 417 Nov-24-2023, 04:55 PM
Last Post: RockBlok
  Filter data into new dataframe as main dataframe is being populated cubangt 8 1,007 Oct-23-2023, 12:43 AM
Last Post: cubangt
  DataFRame.concat() nafshar 3 790 Jul-14-2023, 04:41 PM
Last Post: nafshar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020