Python Forum
String Matching Using python - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: String Matching Using python (/thread-36273.html)



String Matching Using python - SathiyaBeegam - Feb-03-2022

Hi,

I have two files with Item names. I have to match Item name from one file to another using python. I tried fuzzywuzzy and RapidFuzzy, but the matching is not satisfied. I am attaching those two files - to be matched(File_ToMatch.xlsx ) and the master file(to get the matched strings - Master_File.xlsx).

Could anyone help me to get maximum perfect matches.

Thanks in advance


RE: String Matching Using python - jefsummers - Feb-04-2022

Please post your code and where you think the problem lies. Output results, any errors (complete error messages as there is a lot of information there) please.


RE: String Matching Using python - SathiyaBeegam - Feb-05-2022

Hi,

Thanks for your Reply and time. I have no errors in code, but the matching was not proper(out of 132 names, only 64 were matched Correctly, remaining 75 are incorrect match).
I will include my code here -used RapidFuzzy. Please help me to get maximum number of excact matches.

I have attached the Mapped file by running the below code. I have changed the column headers and added the correct matches manually for your better understanding.

Thanks for your help.

#### Code ###

import pandas as pd
import numpy as np
import openpyxl
from rapidfuzz.fuzz import token_set_ratio as rapid_token_set_ratio
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils
from rapidfuzz.process import extractOne
import time

df_To_Match = pd.read_excel('File_ToMatch.xlsx')
df_Master = pd.read_excel('Master_File.xlsx')


lookup_list = list(df_Master["Item_Description_Master"])
matched_values = []

for i in list(df_To_Match["Item_Description_To_Match"]):
matched_values.append(extractOne(i, lookup_list))


# Store results in a DataFrame
matched_df = pd.DataFrame(
matched_values,
columns=["Item_Description_Master", "similarity score", "index in list"],
)

# Concat results with original DataFrame
result = pd.concat([df_To_Match, matched_df], axis=1)
result.to_excel('Mapped.xlsx')