Jun-02-2019, 12:44 AM
(This post was last modified: Jun-02-2019, 12:45 AM by Drone4four.)
I substituted my f-string for the one you suggested. I noticed right away that the output was the same as before. It’s identical. So I thought I didn’t execute the correct file name in my shell. In the end I included both f-strings (the original and @ichabod801’s) on the same line separated by a string of three pipes. So here is what my line 17 looks like now:
Remember: I’m trying to filter out all the common stopwords in a large text file so that the output should be the most commonly used nouns in the book Alice and Wonderland.
For what it’s worth, here is my script entirely so far up to this point:
print(f'{word!r:<4} {"-->":^4} {count:>4} {"|||":^4} {word:<4} {"-->":^4} {count:>4}')The first
word
variable on the left includes !r
where as the word
variable on the right does not. Yet the output remains the same:Quote: $ python3 script.py
'said' --> 462 ||| said --> 462
'alice' --> 403 ||| alice --> 403
'i' --> 283 ||| i --> 283
'it' --> 205 ||| it --> 205
's' --> 184 ||| s --> 184
'little' --> 128 ||| little --> 128
'you' --> 115 ||| you --> 115
'and' --> 107 ||| and --> 107
'one' --> 106 ||| one --> 106
'gutenberg' --> 93 ||| gutenberg --> 93
!r
is not catching the apostrophes. I read the Python doc b]@ichabod801[/b] shared and I understand some of it. Apparently !r
should filter out apostrophes that are part of a word in a string or set of strings (or in my case throughout a full length book). So my hypothesis was that the “s” and “it” and “it’s” to be removed from the output. Apparently I need a new hypothesis. I’m all out of ideas. What do you people think could the issue be here?Remember: I’m trying to filter out all the common stopwords in a large text file so that the output should be the most commonly used nouns in the book Alice and Wonderland.
For what it’s worth, here is my script entirely so far up to this point:
from collections import Counter from nltk.corpus import stopwords import re def open_file(): with open('Alice.txt') as f: text = f.read().lower() return text def main(text): stoplist = stopwords.words('english') # Bring in the default English NLTK stop words clean = [word for word in text.split() if word not in stoplist] # clean_text = ' '.join(clean) words = re.findall('\w+', clean_text) top_10 = Counter(words).most_common(10) for word,count in top_10: print(f'{word!r:<4} {"-->":^4} {count:>4} {"|||":^4} {word:<4} {"-->":^4} {count:>4}') if __name__ == "__main__": text = open_file() main(text)Thanks again, @ichabod801.