Jan-29-2023, 04:34 AM
(This post was last modified: Jan-29-2023, 04:34 AM by deanhystad.)
You are doing the search backwards. Search for keywords in the sample text, not the other way around.
You can use regular expressions.
You can use regular expressions.
import re keywords = ('apple', 'banana', 'orange') teststring = ( 'this is a test string it contains apple, orange & banana. ' 'Moreover, this i a very long string contact length more than 500k' ) # This is any pattern = '|'.join(keywords) found_any = re.search(pattern, teststring) # This is all found_all = all( re.search(keyword, teststring) for keyword in keywords ) print(found_any, found_all, sep="\n")
Output:<re.Match object; span=(34, 39), match='apple'>
True
Testing with "I like oranges and apples."Output:<re.Match object; span=(7, 13), match='orange'>
False
A different appoach is to use sets. Set matching will be very fast compared to any other kind of search. The results will be slightly different because regex matches orange to oranges, but a set intersection will see these as different words. To use sets, you'll need to first convert the teststring to a set of words. This requires removing all punctuation and stripping whitespace. You probably want to set everything to upper or lower case, so capitalization doesn't prevent matches.import string keywords = {'apple', 'banana', 'orange'} teststring = ( 'this is a test string it contains apple, orange & banana. ' 'Moreover, this i a very long string contact length more than 500k' ) trans = str.maketrans('', '', string.punctuation) testwords = set(map(str.strip, teststring.translate(trans).lower().split())) print(keywords.intersection(testwords))
Output:{'apple', 'orange', 'banana'}
I like how the search results provide the information you need for both any and all.