Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
dataframe merge
#1
hello, i am a little bit frustrated with merge.

I habe two tables:

PZN | Name | Anzahl
1234 | Aspirin | 1
2345 | Mexalen | 2
4567 | Sinupret |9
4567 | Sinupret |2

and

PZN | MB | Zeichen
1234 | 1| +
2345 | 2| ++
4567 | 0| ++

I would like to merge the two tables with "PZN" and only column MB from second table
so that I get

PZN | Name | Anzahl | MB
1234 | Aspirin | 1 |1
2345 | Mexalen | 2 |2
4567 | Sinupret |9 |0
4567 | Sinupret |2 |0

How do I achieve this?
Reply
#2
It appears to be straight forward. The "trick" is only merging some columns from the second dataframe. This can be accomplished by merging a dataframe that only contains the columns you want to merge.

The code below uses a dataframe slice that only has the PZN and MB columns.
import pandas as pd
from io import StringIO

data1 = StringIO(
    """PZN|Name|Anzahl
1234|Aspirin|1
2345|Mexalen|2
4567|Sinupret|9
4567|Sinupret|2
"""
)
data2 = StringIO(
    """PZN|MB|Zeichen
1234|1|+
2345|2|++
4567|0|++
"""
)

df1 = pd.read_csv(data1, sep="|")
df2 = pd.read_csv(data2, sep="|")
print(df1)
print(df2)  # All columns
print(df2[["PZN", "MB"]])  # Only PZN and MB columns
df3 = df1.merge(df2[["PZN", "MB"]], on="PZN")
print(df3)
Output:
PZN Name Anzahl 0 1234 Aspirin 1 1 2345 Mexalen 2 2 4567 Sinupret 9 3 4567 Sinupret 2 PZN MB Zeichen 0 1234 1 + 1 2345 2 ++ 2 4567 0 ++ PZN MB 0 1234 1 1 2345 2 2 4567 0 PZN Name Anzahl MB 0 1234 Aspirin 1 1 1 2345 Mexalen 2 2 2 4567 Sinupret 9 0 3 4567 Sinupret 2 0
Is that the desired output?

Are you reading the data from a csv file? The whitespace around the "|" are a problem. If my data looks like this:
Output:
data1 = StringIO( """PZN | Name | Anzahl 1234|Aspirin|1 2345|Mexalen|2 4567|Sinupret|9 4567|Sinupret|2 """ ) data2 = StringIO( """PZN|MB|Zeichen 1234|1|+ 2345|2|++ 4567|0|++ """ )
I get an error on the merge because one table has "PZN " as a header and the other has "PZN". The best way to fix the problem is clean it up in the files.
Reply
#3
Like deanhystad said, tidy up the input data!
Erst einmal reinige die Daten!

data1 = """1234 | Aspirin | 1
2345 | Mexalen | 2
4567 | Sinupret | 9
4567 | Sinupret | 2"""

cols1 = ['PZN', 'Name', 'Anzahl']
list1 = [s.split(' | ') for s in data1.split('\n')]
df1 = pd.DataFrame(list1, columns=cols1)

data2 = """1234 | 1 | +
2345 | 2 | ++
4567 | 0 | ++"""

cols2 = ['PZN', 'MB', 'Zeichen']
list2 = [s.split(' | ') for s in data2.split('\n')]
df2 = pd.DataFrame(list2, columns=cols2)

df3 = df1.merge(df2[["PZN", "MB"]], on="PZN", how='inner')
df4 = pd.merge(df1, df2, on='PZN', how='inner')
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question How can I merge several images via loop using the information of a dataframe? noahverner1995 1 2,075 Dec-31-2021, 05:03 AM
Last Post: noahverner1995
  Mail Merge from DataFrame into Word Label Template PEGylated_User 0 2,628 Nov-10-2020, 01:01 PM
Last Post: PEGylated_User
  Pands DataFrame merge question FORTITUDE 0 2,346 Sep-20-2018, 01:14 PM
Last Post: FORTITUDE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020