Dec-24-2023, 11:51 AM
Stuck half way up this mountain, I have time. Playing with your data, I got this:
path2csv2 = '/home/pedro/myPython/pandas/csv_files/info2.csv' df2 = pd.read_csv(path2csv2) path2csv3 = '/home/pedro/myPython/pandas/csv_files/info3.csv' df3 = pd.read_csv(path2csv3) df2.update(df3) df4 = pd.merge(df2, df3, left_on=['ID','name'], right_on = ['ID','name']) df5 = pd.merge(df2, df3, how='left', on=['ID', 'name']) df6 = pd.merge(df2, df3, how='outer', on=['ID', 'name'])
Output: ID name code code 1 code 2 code 3
0 ABC hans apple apple pear kale
1 ABC hans pear apple pear kale
2 ABC hans kale apple pear kale
3 DEF holly tomato tomato NaN NaN
4 HIJ jon grapes grapes NaN NaN
There are so many settings for pandas, you need to know exactly what you want. how='outer' seems to get a union of sets, whereas how='inner' seems to get an intersection of sets. But you can use many different methods to get what you want.frames = [df2, df3] result = pd.concat(frames)
Output:result
ID name code code 1 code 2 code 3
0 ABC hans apple NaN NaN NaN
1 ABC hans pear NaN NaN NaN
2 ABC hans kale NaN NaN NaN
3 DEF holly tomato NaN NaN NaN
4 HIJ jon grapes NaN NaN NaN
0 ABC hans NaN apple pear kale
1 DEF holly NaN tomato NaN NaN
2 HIJ jon NaN grapes NaN NaN