Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Merging two DataFrames based on indexes from two other DataFrames
#1
I'm new to pandas have tried going through the docs and experiment with various examples, but this problem I'm tacking has really stumped me.

I have the following two dataframes (DataA/DataB) which I would like to merge on a per global_index/item/values basis.

DataA                      DataB
row  item_id  valueA       row    item_id  valueB
0    x        A1           0      x        B1
1    y        A2           1      y        B2
2    z        A3           2      x        B3
3    x        A4           3      y        B4
4    z        A5           4      z        B5
5    x        A6           5      x        B6
6    y        A7           6      y        B7
7    z        A8           7      z        B8
The list of items(item_ids) is finite and each of the two dataframes represent a the value of a trait (trait A, trait B) for an item at a given global_index value.

The global_index could roughly be thought of as a unit of "time"


The mapping between each data frame (DataA/DataB) and the global_index is done via the following two mapper DFs:

DataA_mapper
global_index  start_row  num_rows
0             0          3
1             3          2
3             5          3


DataB_mapper
global_index  start_row  num_rows
0             0          2
2             2          3
4             5          3
Simply put for a given global_index the mapper will define a list of rows into its respective DF (DataA or DataB) that are associated with that global_index.

I would like to merge the DFs so that I get the following dataframe:

row   global_index  item_id   valueA   valueB
0     0             x         A1        B1
1     0             y         A2        B2
2     0             z         A3        NaN
3     1             x         A4        B1
4     1             z         A5        NaN
5     2             x         A4        B3
6     2             y         A2        B4
7     2             z         A5        B5
8     3             x         A6        B3
9     3             y         A7        B4
10    3             z         A8        B5
11    4             x         A6        B6
12    4             y         A7        B7
13    4             z         A8        B8
In the final datafram any pair of global_index/item_id there will ever be either:

  1. a value for both valueA and valueB
  2. a value only for valueA
  3. a value only for valueB

With the requirement being if there is only one value for a given global_index/item (eg: valueA but no valueB) for the last value of the missing one to be used.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Not correct letter in dataframes (if put in png) picnic 0 167 Apr-20-2020, 09:29 PM
Last Post: picnic
  append dataframes in loop ghena 1 242 Feb-17-2020, 08:43 PM
Last Post: jefsummers
  Concatenate/Join/Merge two Dataframes karlito 4 341 Jan-21-2020, 12:36 PM
Last Post: karlito
  Creating A List of DataFrames & Manipulating Columns in Each DataFrame firebird 1 426 Jul-31-2019, 04:04 AM
Last Post: scidam
  Compare between 2 DataFrames Nidhesh 2 471 Jul-26-2019, 08:16 AM
Last Post: Nidhesh
  Giving index when joining dataframes kw42chan 1 740 Jul-06-2019, 06:19 AM
Last Post: kw42chan
  Could anyone help me get the jaccard distance between my dataframes please? :) a_real_phoenix 0 619 Jun-27-2019, 06:01 PM
Last Post: a_real_phoenix
  Two dataframes merged Ecniv 10 1,206 Jun-16-2019, 09:10 PM
Last Post: Ecniv
  Statistical analysis of two dataframes zhl 1 776 Jun-11-2019, 07:26 PM
Last Post: Ecniv
  Interpolate using multiple dataframes Lastwizzle 0 481 May-29-2019, 05:32 PM
Last Post: Lastwizzle

Forum Jump:


Users browsing this thread: 1 Guest(s)