Feb-11-2021, 09:14 AM
Hello,
I use the SequenceMatcher ratio to match two dataframe with the best ratio.
I want to check first if the score A and AA is good then check if the score between B is BB is good then if the score between C and CC is good, then I add the line
i have 2 datafrme: df1
I use the SequenceMatcher ratio to match two dataframe with the best ratio.
I want to check first if the score A and AA is good then check if the score between B is BB is good then if the score between C and CC is good, then I add the line
i have 2 datafrme: df1
A B C 0 pizza ze 3 1 polo fe 5 2 ninja fi NaNand df2:
AA BB CC 0 za ze NaN 1 po ka 8 2 fe fe 6 3 pizza fi 3 4 polo ko 5 5 ninja 3 pizzaI tried this function, but it doesn't work:
from difflib import SequenceMatcher def similar(a, b): ratio = SequenceMatcher(None, a, b).ratio() return ratio order = [] score = [] for index, row in df1.iterrows(): maxima = [similar(row['A'], j) for j in df2['AA']] best_ratio = max(maxima) if best_ratio > 0.9: maxima2 = [similar(row['B'], j) for j in df2['BB']] best_ratio2 = max(maxima2) if best_ratio2 > 0.9: maxima3 = [similar(row['C'], j) for j in df2['CC']] best_ratio = max(maxima3) best_row = np.argmax(maxima3) order.append(best_row) df2 = df2.iloc[order].reset_index() merge = pd.concat([df1, df2], axis=1)i want dataframe like this:
A B C AA BB CC score 0 pizza ze 3 pizza ze 3 100 1 polo fe 5 polo ko 5 75 2 ninja fi NaN ninja 3 pizza 30