Aug-03-2021, 02:41 AM
You cannot find "john smith" because that is two words and the "tokenizer" code splits up the text file into individual words. I would try the various ratio functions and see how well "john smith" matches the entire text file content treated as one long string. Take a look at this datacamp article:
https://www.datacamp.com/community/tutor...ing-python
If your files are really short I think token_set_ratio looks promising.
https://www.datacamp.com/community/tutor...ing-python
If your files are really short I think token_set_ratio looks promising.