Python Forum
Remove a sentence if it contains a word. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Remove a sentence if it contains a word. (/thread-24340.html)



Remove a sentence if it contains a word. - lokhtar - Feb-10-2020

I have a paragraph, contained in a string variable, that looks like this:

Quote:This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it doesn't, I want to keep it.

I want the string to become:

Quote:This is an example of a paragraph that I have. If it doesn't, I want to keep it.

I could use any help!

Thank you in advance!


RE: Remove a sentence if it contains a word. - Larz60+ - Feb-10-2020

what have you tried so far?


RE: Remove a sentence if it contains a word. - lokhtar - Feb-10-2020

str_to_clean = re.sub("^.*\b(bad|naughty)\b.*$", "", str_to_clean, flags=re.IGNORECASE)


RE: Remove a sentence if it contains a word. - Larz60+ - Feb-10-2020

you can also use existing packages
here's one that claims to be much faster than regex: https://pypi.org/project/better-profanity/


RE: Remove a sentence if it contains a word. - lokhtar - Feb-11-2020

Thanks! I looked into that, but it simply replaces the words - I need to remove the whole sentence. I have it working where I go through the paragraph by line (each '.' is a new loop), and then combining the strings that don't contain those words. It works - but it seems like an extremely inelegant solution.


RE: Remove a sentence if it contains a word. - lokhtar - Feb-11-2020

str_to_clean = "This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it is naughty, I do not want it. If it doesn't, I want to keep it."

cleaned_str = ""

for sentence in str_to_clean.split("."):
    if not (re.search("bad|naughty", sentence, flags=re.IGNORECASE)):
        cleaned_str = cleaned_str + sentence

print(cleaned_str)
The above works, but it seems...not the best.


RE: Remove a sentence if it contains a word. - stullis - Feb-11-2020

Since you split the string on ".", you need to reinsert the periods. Changing cleaned_str to a list and using str.join() will get that done.

import re

str_to_clean = "This is an example of a paragraph that I have. I would like to remove any sentences containing certain words, for example the word bad, or naughty. If it has bad, I don't want it. If it is naughty, I do not want it. If it doesn't, I want to keep it."
 
cleaned_str = []
 
for sentence in str_to_clean.split("."):
    if not (re.search("bad|naughty", sentence, flags=re.IGNORECASE)):
        cleaned_str.append(sentence)

print(".".join(cleaned_str))