I am currently working on a huge csv file which I need to select a row and compare it with every other row. Which then should return me how much of the first string is in the second string. If the first string is "Card" and the second string is "Credit Card Debit Card" it should return 2 to me. The find_similar function does that but it doesn't work like I want it to. What I currently have is this:
def find_similar(a,b): return a & b def similarity_test(a): strword1 = set(dff['Product'].str.split().iloc[a]) for j in range(0,100): strword2 = set(dff['Product'].str.split().iloc[j]) lenofWord1 = len(strword1) lenofWord2 = len(strword2) samewords = find_similar(strword1,strword2) samelen = len(samewords) if samelen == 0: print("Not alike") else: if lenofWord1 > lenofWord2: length = lenofWord1 print("%", (samelen / length) * 100) elif lenofWord1 < lenofWord2: length = lenofWord2 print("%", (samelen / length) * 100) elif lenofWord1 == lenofWord2: length = lenofWord1 print("%", (samelen / length) * 100) data = int(input("Which index should be tested:")) similarity_test(data)It is working good when it is 100% similar or there is no repeating words of string1 in string2. What it should do is in the picture but it gives me 12.5% instead of 25%. Any help on how can I solve this? I've included everything I use in the code excluding the dataframe. Thanks in advance.
![[Image: E3m4J.png]](https://i.stack.imgur.com/E3m4J.png)