Python Forum

Full Version: Count & Sort occurrences of text in a file
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
ALL:
I have an ASCII text file that contains First and Last names of a phone list.
Each full name is on a newline.
I need a new list generated that is sorted by the number of occurrences these names appear in the text file.

for example:
the file names.txt contains:
bill smith
joe williams
bill smith
jane doe
joe williams
bill smith

the final list should look like this:
bill smith (3)
joe williams (2)
jane doe (1)

All the examples I see online break the text into a single letter list, which I DON'T WANT!

Thanks in advance for your help.
No list literal in Python can look like desired “final list”.

What have you tried? Please provide code which does “single letter list” and maybe we can help.
from collections import Counter, defaultdict

f = open('phonebook.tmp', "r")
if f.mode == "r":
    text = f.read()


freqword = defaultdict(list)
for word, freq in Counter(text).items():
    freqword[freq].append(word)

# print in order of occurrence (with sorted list of words)
for freq in sorted(freqword):
    print('count {}: {}'.format(freq, sorted(freqword[freq])))
you overcomplicate things
from collections import Counter
 
with open('dupes.csv', "r") as f:
    for name, name_count in Counter(f).items():
        print(f'{name.strip()}:{name_count}')
Output:
bill smith:3 joe williams:2 jane doe:1
That's GREAT, but how can I sort the final list according to the number of occurrences?
The list's sort method or the sorted function take an argument called key that lets you specify a function to be used to obtain the value for each item to be used for sorting.
(Sep-06-2020, 01:54 PM)ndc85430 Wrote: [ -> ]That's GREAT, but how can I sort the final list according to the number of occurrences?
you can use most_common() method of Counter:
Quote:Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered in the order first encountered:

from collections import Counter
  
with open('dupes.csv', "r") as f:
    for name, name_count in Counter(f).most_common():
        print(f'{name.strip()}:{name_count}')
That did the trick. Thank you guys for all your help.
I'm a python newbie, but I understand most of the syntax, especially when i see an example for the first time.