Python Forum
How do I improve string similarity in my current code?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do I improve string similarity in my current code?
#1
Here’s my current code:
new_list = []

for i in range(len(title1)):
    for j in range(len(title2)):
    r = []
    title_distance = fuzz.token_sort_ratio(title1[i], title2[j])
    if (title_distance > threshold):
        r.append(amazon_s['idAmazon'][i])
        r.append(google_s['idGoogleBase'][j])
        new_list.append(r)

df = pd.DataFrame(new_list)
df.to_csv('task.csv')
title1 here is a list of product title from amazon website
title2 is a list of product title from google website

id1 is an id corresponding to amazon website product
id2 is an id corresponding to google website product

Both of them have a list of product titles,
Case 1: Title is the same
Case 2: Title is similar

After sorting them out,
I would like to output them into an excel file using pandas,
which contain id1 and id2 if Case 1 and Case 2 are satisfied.

Any thoughts on this problem?
Reply
#2
You will need to import the pandas module first like import pandas as pd
Reply
#3
(May-27-2020, 08:50 AM)Calli Wrote: You will need to import the pandas module first like import pandas as pd

Not the code that i need improving on, but the algorithm to solve this problem.
Right now:

Recall: 0.72307 out of 1
Precision: 0.77049 out of 1

recall = tp/(tp+fn)
precision = tp/(tp+fp)

True Positive Count: 94
False Positive Count: 28
False Negative Count: 36
True Negative Count: 22042
Reply
#4
Nice description of the fuzz functions here:

https://stackoverflow.com/questions/3180...-2-strings
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Similarity function for couple system sunnydayxo 1 2,084 Apr-16-2021, 07:11 AM
Last Post: MH90000
  How will you improve my code? Gateux 4 2,329 Jul-20-2019, 12:55 PM
Last Post: ndc85430
  fingerprint similarity audio microphone alessandro87gatto 1 2,366 May-03-2019, 01:33 PM
Last Post: alessandro87gatto
  Similarity network Absolumentpasadrien 3 2,608 Apr-05-2019, 10:31 AM
Last Post: DeaD_EyE
  Improve this code (Receipt alike) Leonzxd 10 7,728 Jun-26-2018, 03:33 PM
Last Post: Leonzxd

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020