Nov-19-2020, 08:54 PM
I’ve got a basic Django project. One feature I am working on counts the number of most commonly occurring words in a .txt file, such as a large public domain book. I’ve used the Python Natural Language Tool Kit to filter out “stopwords” (in SEO language, that means redundant words such as ‘the’, ‘you’, etc. ).
Anyways, I’m getting this traceback on my Django server:
After Googling around, I discovered the reason why is because I need to download the library of stopwords. To resolve the issue, I simply open a Python REPL on my remote server and invoke these two straight forward lines:
Here are the relevant lines from my
So my question is: How do I invoke
Here is the utility file in full in my GitHub repo.
I decided to post to the General Coding Help forum instead of web development because the answer to my question is more to do with Python in general rather than being specific to Django.
Anyways, I’m getting this traceback on my Django server:
Quote: Resource [93mstopwords[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('stopwords')
[0m
For more information see: https://www.nltk.org/data.html
After Googling around, I discovered the reason why is because I need to download the library of stopwords. To resolve the issue, I simply open a Python REPL on my remote server and invoke these two straight forward lines:
>>> import nltk >>> nltk.download('stopwords')That resolves the issue, but only temporarily. As soon as the REPL session is terminated, the error returns. I figure I need to use the built in
.save
class method but I am not sure which attribute to pair it with. Here are the relevant lines from my
utils.py
file:import re from collections import Counter from nltk.corpus import stopwords #library used to filter out common english words to produce more meaningful output from blogs.models import Posts def top_word_counts(text): stoplist = stopwords.words('english') stoplist.extend(["said", "gutenberg", "could", "would", "shall", "unto", "thou", "thy", "ye", "thee","upon", "hath","came", "come","things", "also", "saying", "say"]) # Added the mechanism to extend the list to include integers between 0 and 1999 extendinteger = list(range(0, 2000)) # Using map() it will convert the given type with one by iterations # of the array and convert to the corresponding type stoplist.extend(list(map(str,extendinteger))) clean = [] for word in re.split(r"\W+", text): if word not in stoplist: clean.append(word) top_10 = Counter(clean).most_common(10) return top_10I tried adding
import nltk
to the top of this script and adding nltk.download('stopwords')
to different locations within the top_word_counts
function but that didn’t work. So my question is: How do I invoke
nltk.download('stopwords')
so that it automatically runs once without having to manually load it in the Python REPL?Here is the utility file in full in my GitHub repo.
I decided to post to the General Coding Help forum instead of web development because the answer to my question is more to do with Python in general rather than being specific to Django.