Python Forum
String Matching Using python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
String Matching Using python
#1
Hi,

I have two files with Item names. I have to match Item name from one file to another using python. I tried fuzzywuzzy and RapidFuzzy, but the matching is not satisfied. I am attaching those two files - to be matched(File_ToMatch.xlsx ) and the master file(to get the matched strings - Master_File.xlsx).

Could anyone help me to get maximum perfect matches.

Thanks in advance

Attached Files

.xlsx   File_ToMatch.xlsx (Size: 11.72 KB / Downloads: 0)
.xlsx   Master_File.xlsx (Size: 16.06 KB / Downloads: 0)
Reply
#2
Please post your code and where you think the problem lies. Output results, any errors (complete error messages as there is a lot of information there) please.
Reply
#3
Hi,

Thanks for your Reply and time. I have no errors in code, but the matching was not proper(out of 132 names, only 64 were matched Correctly, remaining 75 are incorrect match).
I will include my code here -used RapidFuzzy. Please help me to get maximum number of excact matches.

I have attached the Mapped file by running the below code. I have changed the column headers and added the correct matches manually for your better understanding.

Thanks for your help.

#### Code ###

import pandas as pd
import numpy as np
import openpyxl
from rapidfuzz.fuzz import token_set_ratio as rapid_token_set_ratio
from rapidfuzz import process as process_rapid
from rapidfuzz import utils as rapid_utils
from rapidfuzz.process import extractOne
import time

df_To_Match = pd.read_excel('File_ToMatch.xlsx')
df_Master = pd.read_excel('Master_File.xlsx')


lookup_list = list(df_Master["Item_Description_Master"])
matched_values = []

for i in list(df_To_Match["Item_Description_To_Match"]):
matched_values.append(extractOne(i, lookup_list))


# Store results in a DataFrame
matched_df = pd.DataFrame(
matched_values,
columns=["Item_Description_Master", "similarity score", "index in list"],
)

# Concat results with original DataFrame
result = pd.concat([df_To_Match, matched_df], axis=1)
result.to_excel('Mapped.xlsx')

Attached Files

.xlsx   Mapped.xlsx (Size: 17.51 KB / Downloads: 0)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020