Python Forum

Full Version: Help me with this task
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm working on a Python project where I need to read a text file, count the occurrences of each word, and then display the top 10 most common words. Here's what I have so far, but I'm struggling with optimizing it and handling punctuation properly. Any suggestions?
from collections import Counter

def count_words(filename):
    with open(filename, 'r', encoding='utf-8') as file:
        words = file.read().lower().split()
        word_counts = Counter(words)
        return word_counts.most_common(10)

filename = 'sample.txt'  # Example file
print(count_words(filename))
Link Removed
Hello and welcome to the forum. .read() brings in the text as a string. That’s when you can use .replace() to remove the offending punctuation by replacing any punctuation character with an empty string. Then use .split() which returns a list for using Counter. Insert any punctuation characters that you wish to remove in the line:
for character in “!?.,-()”
. Here’s a suggestion that includes a little nicer printout.
from collections import Counter
 
def count_words(filename):
	with open(filename, 'r', encoding='utf-8') as file:
		words = file.read().lower ()
		for character in "!?.,-()":
			words = words.replace(character, "")
		return Counter(words.split ()).most_common(10)
 
filename = 'sample.txt'  # Example file
for word, count in count_words(filename):
	print(word, count)