Python Forum

Full Version: String Matching Using python
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I have two files with Item names. I have to match Item name from one file to another using python. I tried fuzzywuzzy and RapidFuzzy, but the matching is not satisfied. I am attaching those two files - to be matched(File_ToMatch.xlsx ) and the master file(to get the matched strings - Master_File.xlsx).

Could anyone help me to get maximum perfect matches.

Thanks in advance
Please post your code and where you think the problem lies. Output results, any errors (complete error messages as there is a lot of information there) please.
Hi,

Thanks for your Reply and time. I have no errors in code, but the matching was not proper(out of 132 names, only 64 were matched Correctly, remaining 75 are incorrect match).
I will include my code here -used RapidFuzzy. Please help me to get maximum number of excact matches.

I have attached the Mapped file by running the below code. I have changed the column headers and added the correct matches manually for your better understanding.

Thanks for your help.

#### Code ###

import pandas as pd
import numpy as np
import openpyxl
from rapidfuzz.fuzz import token_set_ratio as rapid_token_set_ratio
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils
from rapidfuzz.process import extractOne
import time

df_To_Match = pd.read_excel('File_ToMatch.xlsx')
df_Master = pd.read_excel('Master_File.xlsx')


lookup_list = list(df_Master["Item_Description_Master"])
matched_values = []

for i in list(df_To_Match["Item_Description_To_Match"]):
matched_values.append(extractOne(i, lookup_list))


# Store results in a DataFrame
matched_df = pd.DataFrame(
matched_values,
columns=["Item_Description_Master", "similarity score", "index in list"],
)

# Concat results with original DataFrame
result = pd.concat([df_To_Match, matched_df], axis=1)
result.to_excel('Mapped.xlsx')