Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Merging two DataFrames based on indexes from two other DataFrames
#1
I'm new to pandas have tried going through the docs and experiment with various examples, but this problem I'm tacking has really stumped me.

I have the following two dataframes (DataA/DataB) which I would like to merge on a per global_index/item/values basis.

DataA                      DataB
row  item_id  valueA       row    item_id  valueB
0    x        A1           0      x        B1
1    y        A2           1      y        B2
2    z        A3           2      x        B3
3    x        A4           3      y        B4
4    z        A5           4      z        B5
5    x        A6           5      x        B6
6    y        A7           6      y        B7
7    z        A8           7      z        B8
The list of items(item_ids) is finite and each of the two dataframes represent a the value of a trait (trait A, trait B) for an item at a given global_index value.

The global_index could roughly be thought of as a unit of "time"


The mapping between each data frame (DataA/DataB) and the global_index is done via the following two mapper DFs:

DataA_mapper
global_index  start_row  num_rows
0             0          3
1             3          2
3             5          3


DataB_mapper
global_index  start_row  num_rows
0             0          2
2             2          3
4             5          3
Simply put for a given global_index the mapper will define a list of rows into its respective DF (DataA or DataB) that are associated with that global_index.

I would like to merge the DFs so that I get the following dataframe:

row   global_index  item_id   valueA   valueB
0     0             x         A1        B1
1     0             y         A2        B2
2     0             z         A3        NaN
3     1             x         A4        B1
4     1             z         A5        NaN
5     2             x         A4        B3
6     2             y         A2        B4
7     2             z         A5        B5
8     3             x         A6        B3
9     3             y         A7        B4
10    3             z         A8        B5
11    4             x         A6        B6
12    4             y         A7        B7
13    4             z         A8        B8
In the final datafram any pair of global_index/item_id there will ever be either:

  1. a value for both valueA and valueB
  2. a value only for valueA
  3. a value only for valueB

With the requirement being if there is only one value for a given global_index/item (eg: valueA but no valueB) for the last value of the missing one to be used.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  append dataframes in loop ghena 1 65 8 hours ago
Last Post: jefsummers
  Concatenate/Join/Merge two Dataframes karlito 4 187 Jan-21-2020, 12:36 PM
Last Post: karlito
  Creating A List of DataFrames & Manipulating Columns in Each DataFrame firebird 1 287 Jul-31-2019, 04:04 AM
Last Post: scidam
  Compare between 2 DataFrames Nidhesh 2 339 Jul-26-2019, 08:16 AM
Last Post: Nidhesh
  Giving index when joining dataframes kw42chan 1 565 Jul-06-2019, 06:19 AM
Last Post: kw42chan
  Could anyone help me get the jaccard distance between my dataframes please? :) a_real_phoenix 0 514 Jun-27-2019, 06:01 PM
Last Post: a_real_phoenix
  Two dataframes merged Ecniv 10 937 Jun-16-2019, 09:10 PM
Last Post: Ecniv
  Statistical analysis of two dataframes zhl 1 645 Jun-11-2019, 07:26 PM
Last Post: Ecniv
  Interpolate using multiple dataframes Lastwizzle 0 385 May-29-2019, 05:32 PM
Last Post: Lastwizzle
  Sum product multiple Dataframes based on column headers. Lastwizzle 0 888 May-21-2019, 04:05 PM
Last Post: Lastwizzle

Forum Jump:


Users browsing this thread: 1 Guest(s)