Python Forum

Full Version: Merging rows and adding columns based on matching index
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello community, tried this a million different ways with pandas groupby but no luck, any help is appreciated. First year learning python and still trying to figure things out. Thanks!

ID name code
ABC hans apple
ABC hans pear
ABC hans kale
DEF holly tomato
HIJ jon grapes

ID name code 1 code 2 code 3
ABC hans apple pear kale
DEF holly tomato
HIJ jon grapes
A million different ways? Or maybe 5 different ways. Could you post one of those ways an example? It would provide useful information about the structure of your data and dataframes. It would also provide a lot of information about why you are having difficulties.
With this data:

Output:
ID name code 0 ABC hans apple 1 ABC hans pear 2 ABC hans kale 3 DEF holly tomato 4 HIJ jon grapes
Could try like this:

df.groupby(['name'])['name'].count()
Output:
name hans 3 holly 1 jon 1 Name: name, dtype: int64
Or like this:

print(df.groupby('name').groups)
Output:
{'hans': [0, 1, 2], 'holly': [3], 'jon': [4]}
Depends on what you need!
Stuck half way up this mountain, I have time. Playing with your data, I got this:

path2csv2 = '/home/pedro/myPython/pandas/csv_files/info2.csv'
df2 = pd.read_csv(path2csv2)
path2csv3 = '/home/pedro/myPython/pandas/csv_files/info3.csv'
df3 = pd.read_csv(path2csv3)
df2.update(df3)
df4 = pd.merge(df2, df3, left_on=['ID','name'], right_on = ['ID','name'])
df5 = pd.merge(df2, df3, how='left', on=['ID', 'name'])
df6 = pd.merge(df2, df3, how='outer', on=['ID', 'name'])
Output:
ID name code code 1 code 2 code 3 0 ABC hans apple apple pear kale 1 ABC hans pear apple pear kale 2 ABC hans kale apple pear kale 3 DEF holly tomato tomato NaN NaN 4 HIJ jon grapes grapes NaN NaN
There are so many settings for pandas, you need to know exactly what you want. how='outer' seems to get a union of sets, whereas how='inner' seems to get an intersection of sets. But you can use many different methods to get what you want.

frames = [df2, df3]
result = pd.concat(frames)
Output:
result ID name code code 1 code 2 code 3 0 ABC hans apple NaN NaN NaN 1 ABC hans pear NaN NaN NaN 2 ABC hans kale NaN NaN NaN 3 DEF holly tomato NaN NaN NaN 4 HIJ jon grapes NaN NaN NaN 0 ABC hans NaN apple pear kale 1 DEF holly NaN tomato NaN NaN 2 HIJ jon NaN grapes NaN NaN