Python Forum

Full Version: Remove a sentence if it contains a word.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a paragraph, contained in a string variable, that looks like this:

Quote:This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it doesn't, I want to keep it.

I want the string to become:

Quote:This is an example of a paragraph that I have. If it doesn't, I want to keep it.

I could use any help!

Thank you in advance!
what have you tried so far?
str_to_clean = re.sub("^.*\b(bad|naughty)\b.*$", "", str_to_clean, flags=re.IGNORECASE)
you can also use existing packages
here's one that claims to be much faster than regex: https://pypi.org/project/better-profanity/
Thanks! I looked into that, but it simply replaces the words - I need to remove the whole sentence. I have it working where I go through the paragraph by line (each '.' is a new loop), and then combining the strings that don't contain those words. It works - but it seems like an extremely inelegant solution.
str_to_clean = "This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it is naughty, I do not want it. If it doesn't, I want to keep it."

cleaned_str = ""

for sentence in str_to_clean.split("."):
    if not (re.search("bad|naughty", sentence, flags=re.IGNORECASE)):
        cleaned_str = cleaned_str + sentence

print(cleaned_str)
The above works, but it seems...not the best.
Since you split the string on ".", you need to reinsert the periods. Changing cleaned_str to a list and using str.join() will get that done.

import re

str_to_clean = "This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it is naughty, I do not want it. If it doesn't, I want to keep it."
 
cleaned_str = []
 
for sentence in str_to_clean.split("."):
    if not (re.search("bad|naughty", sentence, flags=re.IGNORECASE)):
        cleaned_str.append(sentence)

print(".".join(cleaned_str))