Python Forum

Hello. I have some data such as this.

aaa
bbb
ccc
aaa
ccc
ddd
fff
aaa
ccc
aaa

I know using the set will give the unique values but what i need is the unique values and the count. for example:

aaa 4
bbb 1
ccc 3
ddd 1

Batteries included ...

from collections import Counter


your_input = """aaa
bbb
ccc
aaa
ccc
ddd
fff
aaa
ccc
aaa"""

sequence = your_input.strip().split()
counter = Counter(sequence)

print(counter.most_common(5))

Thanks a lot.

Or if you want to know how to do it:

your_input = """aaa
bbb
ccc
aaa
ccc
ddd
fff
aaa
ccc
aaa"""

counts = {}
for entry in your_input.strip().split()
    counts[entry] = counts.get(entry, 0) + 1

from collections import Counter
input = ['a', 'a', 'b', 'b', 'b']
c = Counter( input )

print( c.items() ) .

If all you want to do is find the counts (and not do any further processing in a Python script), then if you're on UNIX, you don't need Python at all. Command line tools will do the job:

Output:$ cat << EOF | sort | uniq -c
aaa
bbb
ccc
aaa
ccc
ddd
fff
aaa
ccc
aaa
EOF
      4 aaa
      1 bbb
      3 ccc
      1 ddd
      1 fff

Obviously you can read from a file if you needed to, but I've used a here document to pass the input to cat. uniq's -c option provides the counts but the program looks at adjacent lines only to filter out duplicates, so sort is necessary to put them next to each other.

james2009

DeaD_EyE

james2009

deanhystad

clarabrandt

ndc85430