Python Forum

Full Version: matching with SequenceMatcher ratio two dataframe
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I use the SequenceMatcher ratio to match two dataframe with the best ratio.

I want to check first if the score A and AA is good then check if the score between B is BB is good then if the score between C and CC is good, then I add the line

i have 2 datafrme: df1
        A     B     C
0    pizza    ze    3
1    polo     fe    5
2    ninja    fi    NaN
and df2:
     AA      BB      CC
0    za      ze      NaN
1    po      ka       8
2    fe      fe       6
3    pizza   fi       3
4    polo    ko       5
5    ninja   3        pizza
I tried this function, but it doesn't work:
from difflib import SequenceMatcher
def similar(a, b):
    ratio = SequenceMatcher(None, a, b).ratio()
    return ratio
order = []
score = []
for index, row in df1.iterrows():
    maxima = [similar(row['A'], j) for j in df2['AA']]
    best_ratio = max(maxima)
    if best_ratio > 0.9:     
        maxima2 = [similar(row['B'], j) for j in df2['BB']]
        best_ratio2 = max(maxima2)
        if best_ratio2 > 0.9:
           maxima3 = [similar(row['C'], j) for j in 
                      df2['CC']]
           best_ratio = max(maxima3)
           best_row = np.argmax(maxima3)
           order.append(best_row)
df2 = df2.iloc[order].reset_index()
merge = pd.concat([df1, df2], axis=1)
i want dataframe like this:
      A        B         C       AA          BB     CC      score
0    pizza    ze         3        pizza       ze      3      100
1    polo     fe         5        polo        ko      5       75
2    ninja    fi        NaN       ninja       3      pizza    30
Quote:I tried this function, but it doesn't work

What about it doesn't work? Are there errors? What are they?
Or is the output not what you expected? What was the output, and what did you expect?
(Feb-11-2021, 09:58 PM)nilamo Wrote: [ -> ]
Quote:I tried this function, but it doesn't work

What about it doesn't work? Are there errors? What are they?
Or is the output not what you expected? What was the output, and what did you expect?
👌🆗️