Python Forum

Full Version: How to use the re library to remove irrelevant words?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I'm currently making a python code which includes the creation of a wordcloud. The problem is that there are some words that are irrelevant with the wordcloud, such as "https". I tried to import re library but it doesn't work. Also, I tried the solution with the stopwords but that didn't solve the problem. Every help will be appreciated.
Given that you're creating said word cloud, why would you include words that are irrelevant? And what do you mean "I tried to import re library but it doesn't work." It doesn't work how?
I don't know why the cloud includes irrelevant words. I found that if I imported RE library and after I wrote the command
tweets = [re.sub(r'https\S+', '', x) for x in tweets]
it would be solved, but it didn't work.
Ah, right, so it's not the import that does not work, it's the sub() function pattern matching.
Is there not a way to extract the text from a tweet, rather including the URL only to have to filter it out? As is, it seems that you're creating an issue that needs to be solved, but the code snippet is a little short to fully understand your methodology.

There's a saying that goes something like, when you use regex to solve a problem, you now have two problems to solve.

edit to add:

If there's no other way, I'd try something like this:

pattern = 'https\S+'
repl = ''
result = re.sub(pattern, repl, tweets)

print(result)