Feb-05-2022, 06:46 AM
Hi,
Thanks for your Reply and time. I have no errors in code, but the matching was not proper(out of 132 names, only 64 were matched Correctly, remaining 75 are incorrect match).
I will include my code here -used RapidFuzzy. Please help me to get maximum number of excact matches.
I have attached the Mapped file by running the below code. I have changed the column headers and added the correct matches manually for your better understanding.
Thanks for your help.
#### Code ###
import pandas as pd
import numpy as np
import openpyxl
from rapidfuzz.fuzz import token_set_ratio as rapid_token_set_ratio
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils
from rapidfuzz.process import extractOne
import time
df_To_Match = pd.read_excel('File_ToMatch.xlsx')
df_Master = pd.read_excel('Master_File.xlsx')
lookup_list = list(df_Master["Item_Description_Master"])
matched_values = []
for i in list(df_To_Match["Item_Description_To_Match"]):
matched_values.append(extractOne(i, lookup_list))
# Store results in a DataFrame
matched_df = pd.DataFrame(
matched_values,
columns=["Item_Description_Master", "similarity score", "index in list"],
)
# Concat results with original DataFrame
result = pd.concat([df_To_Match, matched_df], axis=1)
result.to_excel('Mapped.xlsx')
Thanks for your Reply and time. I have no errors in code, but the matching was not proper(out of 132 names, only 64 were matched Correctly, remaining 75 are incorrect match).
I will include my code here -used RapidFuzzy. Please help me to get maximum number of excact matches.
I have attached the Mapped file by running the below code. I have changed the column headers and added the correct matches manually for your better understanding.
Thanks for your help.
#### Code ###
import pandas as pd
import numpy as np
import openpyxl
from rapidfuzz.fuzz import token_set_ratio as rapid_token_set_ratio
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils
from rapidfuzz.process import extractOne
import time
df_To_Match = pd.read_excel('File_ToMatch.xlsx')
df_Master = pd.read_excel('Master_File.xlsx')
lookup_list = list(df_Master["Item_Description_Master"])
matched_values = []
for i in list(df_To_Match["Item_Description_To_Match"]):
matched_values.append(extractOne(i, lookup_list))
# Store results in a DataFrame
matched_df = pd.DataFrame(
matched_values,
columns=["Item_Description_Master", "similarity score", "index in list"],
)
# Concat results with original DataFrame
result = pd.concat([df_To_Match, matched_df], axis=1)
result.to_excel('Mapped.xlsx')
Attached Files
Mapped.xlsx (Size: 17.51 KB / Downloads: 0)