Python Forum
Merging rows and adding columns based on matching index
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merging rows and adding columns based on matching index
#1
Hello community, tried this a million different ways with pandas groupby but no luck, any help is appreciated. First year learning python and still trying to figure things out. Thanks!

ID name code
ABC hans apple
ABC hans pear
ABC hans kale
DEF holly tomato
HIJ jon grapes

ID name code 1 code 2 code 3
ABC hans apple pear kale
DEF holly tomato
HIJ jon grapes
Reply
#2
A million different ways? Or maybe 5 different ways. Could you post one of those ways an example? It would provide useful information about the structure of your data and dataframes. It would also provide a lot of information about why you are having difficulties.
Reply
#3
With this data:

Output:
ID name code 0 ABC hans apple 1 ABC hans pear 2 ABC hans kale 3 DEF holly tomato 4 HIJ jon grapes
Could try like this:

df.groupby(['name'])['name'].count()
Output:
name hans 3 holly 1 jon 1 Name: name, dtype: int64
Or like this:

print(df.groupby('name').groups)
Output:
{'hans': [0, 1, 2], 'holly': [3], 'jon': [4]}
Depends on what you need!
Reply
#4
Stuck half way up this mountain, I have time. Playing with your data, I got this:

path2csv2 = '/home/pedro/myPython/pandas/csv_files/info2.csv'
df2 = pd.read_csv(path2csv2)
path2csv3 = '/home/pedro/myPython/pandas/csv_files/info3.csv'
df3 = pd.read_csv(path2csv3)
df2.update(df3)
df4 = pd.merge(df2, df3, left_on=['ID','name'], right_on = ['ID','name'])
df5 = pd.merge(df2, df3, how='left', on=['ID', 'name'])
df6 = pd.merge(df2, df3, how='outer', on=['ID', 'name'])
Output:
ID name code code 1 code 2 code 3 0 ABC hans apple apple pear kale 1 ABC hans pear apple pear kale 2 ABC hans kale apple pear kale 3 DEF holly tomato tomato NaN NaN 4 HIJ jon grapes grapes NaN NaN
There are so many settings for pandas, you need to know exactly what you want. how='outer' seems to get a union of sets, whereas how='inner' seems to get an intersection of sets. But you can use many different methods to get what you want.

frames = [df2, df3]
result = pd.concat(frames)
Output:
result ID name code code 1 code 2 code 3 0 ABC hans apple NaN NaN NaN 1 ABC hans pear NaN NaN NaN 2 ABC hans kale NaN NaN NaN 3 DEF holly tomato NaN NaN NaN 4 HIJ jon grapes NaN NaN NaN 0 ABC hans NaN apple pear kale 1 DEF holly NaN tomato NaN NaN 2 HIJ jon NaN grapes NaN NaN
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pandas Dataframe Filtering based on rows mvdlm 0 1,452 Apr-02-2022, 06:39 PM
Last Post: mvdlm
  [Pandas] Help with finding only non-matching rows LowEnd 3 3,595 Nov-12-2021, 02:34 PM
Last Post: jefsummers
  numpy adding columns rwahdan 4 2,348 Sep-21-2021, 08:25 PM
Last Post: deanhystad
  [split] Getting Index Error - list index out of range krishna 2 2,631 Jan-09-2021, 08:29 AM
Last Post: buran
  Extracting rows based on condition on one column Robotguy 2 2,236 Aug-07-2020, 02:27 AM
Last Post: Robotguy
  Dropping Rows From A Data Frame Based On A Variable JoeDainton123 1 2,238 Aug-03-2020, 02:05 AM
Last Post: scidam
Question Dividing a single column of dataframe into multiple columns based on char length darpInd 2 2,486 Mar-14-2020, 09:19 AM
Last Post: scidam
  Getting Index Error - list index out of range RahulSingh 2 6,183 Feb-03-2020, 07:17 AM
Last Post: RahulSingh
  Merging two DataFrames based on indexes from two other DataFrames lucinda_rigeitti 0 1,763 Jan-16-2020, 08:36 PM
Last Post: lucinda_rigeitti
  read_csv error and rows/columns missing karlito 9 5,367 Nov-11-2019, 06:48 AM
Last Post: karlito

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020