Jan-19-2019, 12:50 AM
(This post was last modified: Jan-19-2019, 04:23 AM by Drone4four.)
I am playing with large plain text files like Alice and Wonderland and trying to rank most commonly occurring words. Naturally, you’d expect to encounter many instances of “the”, “and”, “a”.
With a little help from @snippsat in my previous thread, check out this script we were working with:
Would anyone care to elaborate on what the Python interpreter is saying in this traceback? What am I missing? What would I need fix in my script for it to run properly as intended?
Attached is the public domain text file I am working with.
With a little help from @snippsat in my previous thread, check out this script we were working with:
from collections import Counter import re with open('Alice.txt') as f: text = f.read().lower() words = re.findall('\w+', text) top_10 = Counter(words).most_common(10) for word,count in top_10: print(f'{word:<4} {"-->":^4} {count:>4}')Here is the smooth output:
Output:$ python with_word_count.py
the --> 1818
and --> 940
to --> 809
a --> 690
of --> 631
it --> 610
she --> 553
i --> 543
you --> 481
said --> 462
It works really well. I have already extended it by adding a feature which provides the total word count. Here is the code I added:wordlist = text.split() print("A total of " + str(len(wordlist)) + " words can be found inside this text file.")I have now set out to extend the features of this script further. At this point right now I am just trying to re-organize and consolidate these operations into separate functions. The script looks alittle different. Here it is:
from collections import Counter import re def word_count(text): wordlist = text.split() print("A total of " + str(len(wordlist)) + " words can be found inside this text file.") def rank_words(): words = re.findall('\w+', text) top_10 = Counter(words).most_common(10) for word,count in top_10: print(f'{word:<4} {"-->":^4} {count:>4}') def main(): with open('Alice.txt') as f: text = f.read().lower() return text if __name__ == '__main__': main() word_count(text) rank_words() passHere is the output:
Output:$ python with_word_count.py
Traceback (most recent call last):
File "with_word_count.py", line 21, in <module>
word_count(text)
NameError: name 'text' is not defined
The NameError
points to the variable text
which “isn’t defined”. The issue indicated here is when the variable text
is referred to at line 21 when the word_count()
function is called. But text is defined in main()
which is the first function that I call at code execution as specified below my: if __name__ == '__main__':
. If any of you are wondering why I chose to organize my script this way, I am following @ichabod801 example in another recent thread I was working on here. When word_count()
is called, text
should have already been returned in the previously called function, main()
, right?Would anyone care to elaborate on what the Python interpreter is saying in this traceback? What am I missing? What would I need fix in my script for it to run properly as intended?
Attached is the public domain text file I am working with.
Attached Files
Alice.txt (Size: 159.97 KB / Downloads: 435)