Jan-16-2020, 08:36 PM
I'm new to pandas have tried going through the docs and experiment with various examples, but this problem I'm tacking has really stumped me.
I have the following two dataframes (DataA/DataB) which I would like to merge on a per global_index/item/values basis.
The list of items(item_ids) is finite and each of the two dataframes represent a the value of a trait (trait A, trait B) for an item at a given global_index value.
The global_index could roughly be thought of as a unit of "time"
The mapping between each data frame (DataA/DataB) and the global_index is done via the following two mapper DFs:
Simply put for a given global_index the mapper will define a list of rows into its respective DF (DataA or DataB) that are associated with that global_index.
I would like to merge the DFs so that I get the following dataframe:
In the final datafram any pair of global_index/item_id there will ever be either:
With the requirement being if there is only one value for a given global_index/item (eg: valueA but no valueB) for the last value of the missing one to be used.
I have the following two dataframes (DataA/DataB) which I would like to merge on a per global_index/item/values basis.
1 2 3 4 5 6 7 8 9 10 |
DataA DataB row item_id valueA row item_id valueB 0 x A1 0 x B1 1 y A2 1 y B2 2 z A3 2 x B3 3 x A4 3 y B4 4 z A5 4 z B5 5 x A6 5 x B6 6 y A7 6 y B7 7 z A8 7 z B8 |
The global_index could roughly be thought of as a unit of "time"
The mapping between each data frame (DataA/DataB) and the global_index is done via the following two mapper DFs:
1 2 3 4 5 6 7 8 9 10 11 12 |
DataA_mapper global_index start_row num_rows 0 0 3 1 3 2 3 5 3 DataB_mapper global_index start_row num_rows 0 0 2 2 2 3 4 5 3 |
I would like to merge the DFs so that I get the following dataframe:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
row global_index item_id valueA valueB 0 0 x A1 B1 1 0 y A2 B2 2 0 z A3 NaN 3 1 x A4 B1 4 1 z A5 NaN 5 2 x A4 B3 6 2 y A2 B4 7 2 z A5 B5 8 3 x A6 B3 9 3 y A7 B4 10 3 z A8 B5 11 4 x A6 B6 12 4 y A7 B7 13 4 z A8 B8 |
- a value for both valueA and valueB
- a value only for valueA
- a value only for valueB
With the requirement being if there is only one value for a given global_index/item (eg: valueA but no valueB) for the last value of the missing one to be used.