String Matching Using python - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: String Matching Using python (/thread-36273.html) |
String Matching Using python - SathiyaBeegam - Feb-03-2022 Hi, I have two files with Item names. I have to match Item name from one file to another using python. I tried fuzzywuzzy and RapidFuzzy, but the matching is not satisfied. I am attaching those two files - to be matched(File_ToMatch.xlsx ) and the master file(to get the matched strings - Master_File.xlsx). Could anyone help me to get maximum perfect matches. Thanks in advance RE: String Matching Using python - jefsummers - Feb-04-2022 Please post your code and where you think the problem lies. Output results, any errors (complete error messages as there is a lot of information there) please. RE: String Matching Using python - SathiyaBeegam - Feb-05-2022 Hi, Thanks for your Reply and time. I have no errors in code, but the matching was not proper(out of 132 names, only 64 were matched Correctly, remaining 75 are incorrect match). I will include my code here -used RapidFuzzy. Please help me to get maximum number of excact matches. I have attached the Mapped file by running the below code. I have changed the column headers and added the correct matches manually for your better understanding. Thanks for your help. #### Code ### import pandas as pd import numpy as np import openpyxl from rapidfuzz.fuzz import token_set_ratio as rapid_token_set_ratio from rapidfuzz import process as process_rapid from rapidfuzz import utils as rapid_utils from rapidfuzz.process import extractOne import time df_To_Match = pd.read_excel('File_ToMatch.xlsx') df_Master = pd.read_excel('Master_File.xlsx') lookup_list = list(df_Master["Item_Description_Master"]) matched_values = [] for i in list(df_To_Match["Item_Description_To_Match"]): matched_values.append(extractOne(i, lookup_list)) # Store results in a DataFrame matched_df = pd.DataFrame( matched_values, columns=["Item_Description_Master", "similarity score", "index in list"], ) # Concat results with original DataFrame result = pd.concat([df_To_Match, matched_df], axis=1) result.to_excel('Mapped.xlsx') |