Posts: 12
Threads: 5
Joined: Dec 2019
I have a paragraph, contained in a string variable, that looks like this:
Quote:This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it doesn't, I want to keep it.
I want the string to become:
Quote:This is an example of a paragraph that I have. If it doesn't, I want to keep it.
I could use any help!
Thank you in advance!
Posts: 12,038
Threads: 487
Joined: Sep 2016
what have you tried so far?
Posts: 12
Threads: 5
Joined: Dec 2019
str_to_clean = re.sub("^.*\b(bad|naughty)\b.*$", "", str_to_clean, flags=re.IGNORECASE)
Posts: 12,038
Threads: 487
Joined: Sep 2016
you can also use existing packages
here's one that claims to be much faster than regex: https://pypi.org/project/better-profanity/
Posts: 12
Threads: 5
Joined: Dec 2019
Thanks! I looked into that, but it simply replaces the words - I need to remove the whole sentence. I have it working where I go through the paragraph by line (each '.' is a new loop), and then combining the strings that don't contain those words. It works - but it seems like an extremely inelegant solution.
Posts: 12
Threads: 5
Joined: Dec 2019
1 2 3 4 5 6 7 8 9 |
str_to_clean = "This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it is naughty, I do not want it. If it doesn't, I want to keep it."
cleaned_str = ""
for sentence in str_to_clean.split( "." ):
if not (re.search( "bad|naughty" , sentence, flags = re.IGNORECASE)):
cleaned_str = cleaned_str + sentence
print (cleaned_str)
|
The above works, but it seems...not the best.
Posts: 443
Threads: 1
Joined: Sep 2018
Since you split the string on ".", you need to reinsert the periods. Changing cleaned_str to a list and using str.join() will get that done.
1 2 3 4 5 6 7 8 9 10 11 |
import re
str_to_clean = "This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it is naughty, I do not want it. If it doesn't, I want to keep it."
cleaned_str = []
for sentence in str_to_clean.split( "." ):
if not (re.search( "bad|naughty" , sentence, flags = re.IGNORECASE)):
cleaned_str.append(sentence)
print ( "." .join(cleaned_str))
|
|