Jan-14-2019, 02:30 AM
I’ve got an older Python 2 script from an outdated Udemy course. The script opens any basic raw text file (such as a large public domain novel like Alice and Wonderland), counts all the words and ranks the top 10 most common occurrences. Naturally, you can expect many occurrences of ‘the’, ‘is’, ‘a’.
It runs as expected using the Python 2 interpreter. Attached is Alice and Wonderland in .txt format. Here is the Python 2 script:
It runs. Pretty neat, eh?
But in it’s first run using the Python 3 interpreter, the trace back points to line 52:
So I add the parenthesis before the w and after the second slice at that line.
When I run the script next I get this trace back:
The first issue is the
Could someone here lend a helping hand to get this script to run in Python 3?
It runs as expected using the Python 2 interpreter. Attached is Alice and Wonderland in .txt format. Here is the Python 2 script:
#!/usr/bin/env python # encoding: utf-8 """ alice_file.py Created by Jason Elbourne on 2011-12-29. Copyright (c) 2011 Jason Elbourne. All rights reserved. """ import operator ## Get each word - Turn to Lower case (.lower()) ## Count Duplicates of words ## Dictionary {word:count,word2:count2} ## Sort this based on most used word ## Print the Top 20 Words def rank_words(f): """ Takes in a file, then ranks all the words within the file Args: a file Return: A sorted list of tuples """ word_dict = {} # Start with empty python Dictionary words = [] # Start with empty python List for line in f: list_of_words = line.split() for w in list_of_words: words.append(w.lower()) # Add Word to List for word in words: if word_dict.has_key(word): word_dict[word] += 1 # Incr. value in Dict. else: word_dict[word] = 1 # Add word and value to Dict. # This will sort the dictionary and return a list of Tuples return sorted(word_dict.iteritems(), reverse=True, \ key=operator.itemgetter(1)) def main(): # Files f = open('Alice.txt', 'rU') ranked_words_list = rank_words(f) f.close() # Print the results for w in list(ranked_words_list[:10]): print w[0],"---", w[1] if __name__ == '__main__': main()Here is the expected output:
Quote:$ python2 pycounter.py
the --- 1605
and --- 766
to --- 706
a --- 614
she --- 518
of --- 493
said --- 421
it --- 362
in --- 351
was --- 333
It runs. Pretty neat, eh?
But in it’s first run using the Python 3 interpreter, the trace back points to line 52:
Quote:$ python pycounter.py
File "pycounter.py", line 52
print w[0],"---", w[1]
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(w[0],"---", w[1])?
So I add the parenthesis before the w and after the second slice at that line.
When I run the script next I get this trace back:
Quote:$ python pycounter.py
pycounter.py:44: DeprecationWarning: 'U' mode is deprecated
f = open('Alice.txt', 'rU')
Traceback (most recent call last):
File "pycounter.py", line 56, in <module>
main()
File "pycounter.py", line 46, in main
ranked_words_list = rank_words(f)
File "pycounter.py", line 38, in rank_words
return sorted(word_dict.iteritems(), reverse=True, \
AttributeError: 'dict' object has no attribute 'iteritems'
The first issue is the
U
parameter for the open function which is no longer usable in Python 3. The official docs say so here. So I remove the U
. Problem solved. But I can’t make sense of the other lines indicated in the trace back. Line 56 is the module’s __name__. I’m not sure what the problem is here. It looks normal and correct to me.Could someone here lend a helping hand to get this script to run in Python 3?