Nov-28-2017, 04:36 PM
Hi,
I wonder if I am on the right track and would like to get your input on my problem:
Goal:
Find a link between two rows in two tables based on a number of criteria.
Approach:
I want to work with a "match score or "confidence level" to determine, based on all my match criteria, which row in table 2 is most likely related to table 1.
In order to keep track of the "match score " I figured a dataframe with the unique row identifiers of table 1 and 2 as index and column would enable me to perform all my match criteria and constantly update the corresponding "match score" in the dataframe .
Question:
The problem I am having is that my way of updating the dataframe is not being saved.
I made a simple example to test my dataframe question. In the example below I point to the intersection of the "match score" that needs to be updated, and update the score, but for the next match the score is again updated from the original value of 0, therefore giving me an end result of 10 instead of my desired 50.
Also if you have any other conceptual suggestions on how I approach my goal I am happy to hear.
I wonder if I am on the right track and would like to get your input on my problem:
Goal:
Find a link between two rows in two tables based on a number of criteria.
Approach:
I want to work with a "match score or "confidence level" to determine, based on all my match criteria, which row in table 2 is most likely related to table 1.
In order to keep track of the "match score " I figured a dataframe with the unique row identifiers of table 1 and 2 as index and column would enable me to perform all my match criteria and constantly update the corresponding "match score" in the dataframe .
Question:
The problem I am having is that my way of updating the dataframe is not being saved.
I made a simple example to test my dataframe question. In the example below I point to the intersection of the "match score" that needs to be updated, and update the score, but for the next match the score is again updated from the original value of 0, therefore giving me an end result of 10 instead of my desired 50.
import pandas as pd import numpy as np table_1 = ('s1','s2','s3','s4','s5') table_2 = ('i1','i2','i3','i4','i5') df = pd.DataFrame(index = table_1, columns = table_2) df = df.fillna(0) for s in table_1: df2= df.loc['s3','i4'] =+ 10 print(df)
Output: i1 i2 i3 i4 i5
s1 0 0 0 0 0
s2 0 0 0 0 0
s3 0 0 0 10 0
s4 0 0 0 0 0
s5 0 0 0 0 0
Do you know how I can save my change to the dataframe?Also if you have any other conceptual suggestions on how I approach my goal I am happy to hear.