Python Forum
Merging two DataFrames based on indexes from two other DataFrames
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Merging two DataFrames based on indexes from two other DataFrames
#1
I'm new to pandas have tried going through the docs and experiment with various examples, but this problem I'm tacking has really stumped me.

I have the following two dataframes (DataA/DataB) which I would like to merge on a per global_index/item/values basis.

DataA                      DataB
row  item_id  valueA       row    item_id  valueB
0    x        A1           0      x        B1
1    y        A2           1      y        B2
2    z        A3           2      x        B3
3    x        A4           3      y        B4
4    z        A5           4      z        B5
5    x        A6           5      x        B6
6    y        A7           6      y        B7
7    z        A8           7      z        B8
The list of items(item_ids) is finite and each of the two dataframes represent a the value of a trait (trait A, trait B) for an item at a given global_index value.

The global_index could roughly be thought of as a unit of "time"


The mapping between each data frame (DataA/DataB) and the global_index is done via the following two mapper DFs:

DataA_mapper
global_index  start_row  num_rows
0             0          3
1             3          2
3             5          3


DataB_mapper
global_index  start_row  num_rows
0             0          2
2             2          3
4             5          3
Simply put for a given global_index the mapper will define a list of rows into its respective DF (DataA or DataB) that are associated with that global_index.

I would like to merge the DFs so that I get the following dataframe:

row   global_index  item_id   valueA   valueB
0     0             x         A1        B1
1     0             y         A2        B2
2     0             z         A3        NaN
3     1             x         A4        B1
4     1             z         A5        NaN
5     2             x         A4        B3
6     2             y         A2        B4
7     2             z         A5        B5
8     3             x         A6        B3
9     3             y         A7        B4
10    3             z         A8        B5
11    4             x         A6        B6
12    4             y         A7        B7
13    4             z         A8        B8
In the final datafram any pair of global_index/item_id there will ever be either:

  1. a value for both valueA and valueB
  2. a value only for valueA
  3. a value only for valueB

With the requirement being if there is only one value for a given global_index/item (eg: valueA but no valueB) for the last value of the missing one to be used.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Merging rows and adding columns based on matching index pythonnewbie78 3 748 Dec-24-2023, 11:51 AM
Last Post: Pedroski55
  Pandas dataframes and numpy arrays bytecrunch 1 1,292 Oct-11-2022, 08:08 PM
Last Post: Larz60+
  Why can't I display these dataframes? brady951 0 1,252 Nov-18-2021, 02:24 PM
Last Post: brady951
  Working with dataframes rotemkat 2 1,991 Aug-04-2021, 11:24 AM
Last Post: jefsummers
  Help setting up dataframes yk303 0 1,637 Jan-05-2021, 10:33 PM
Last Post: yk303
  Matching multiple conditions and mapping multiple results, between two dataframes daveyg33 2 1,758 Jan-02-2021, 10:32 AM
Last Post: daveyg33
  Pandas/Dataframes, Strings and Regular Expressions... Stephan 0 1,283 Nov-25-2020, 08:08 AM
Last Post: Stephan
  matching a row from 2 seperate dataframes and minimg a column randor 0 1,515 Sep-03-2020, 08:57 PM
Last Post: randor
  Simple way to view dataframes Davewd 1 1,600 Aug-24-2020, 12:48 PM
Last Post: Davewd
  Merging sorted dataframes using Pandas Robotguy 1 2,165 Aug-12-2020, 07:11 PM
Last Post: jefsummers

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020