Python Forum
[Help] How to count Letter frequency in a text file?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Help] How to count Letter frequency in a text file?
#1
Hello,

I tried looking for "letter frequency" or "frequency distribution" within the forum but I couldn't find any old thread about the subject, unfortunately.

Here's the task that I'm trying to do:
Calculate a table for each letter in the alphabet from a-z, and count how many times each letter appears in alice_in_wonderland.txt (fancy word for counting stuff is "frequency distribution" - because you are counting the frequency of something)

a: 34,560
b: 5,027
...
z: 893
Store the results in a list of lists:
result = [
["a", 34560],
["b", 5027],
...
["z", 893]
]
Hint: Use python's lower() method to turn all alphabetic letters into small case and count them (so "A" counts towards "a"). Ignore non-alphabetic numbers, you can check with python isalpha() method.

==============================================================================

See attached alice_in_wonderland.txt file in this thread so you can play around with it?

As always, any feedback, as well as ideas, is much appreciated from any Python experts here.

Thank you kindly. :)

==============================================================================
from collections import defaultdict

filename = "alice_in_wonderland.txt"
file = open(filename, encoding="utf8")

def countletters(file):
	results = defaultdict(int)
	for line in file:
		for char in line:
			if char.lower() in filename:
				c = char.lower()
				results[c] += 1
	return results
print(countletters(file))
==============================================================================
Screenshot of my Terminal:

[Image: 07080-AinW_zpsga41asnk.png]

Attached Files

.txt   alice_in_wonderland.txt (Size: 169.53 KB / Downloads: 557)
Blockchain Visionary & Aspiring Encipher/Software Developer
me = {'Python Learner' : 'Beginner\'s Level'}
http://bit.ly/JoinMeOnYouTube
Reply
#2
You are testing to see if the character is in filename, but filename is 'alice_in_wonderland.txt'. So you are getting the characters 'acdeilnortwx_.'. As suggested in the instructions, test for char.isalpha() instead.

And if you are allowed to use collections, you might look at Counter.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
check https://docs.python.org/3/library/collec...ns.Counter
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
@ichabod801 and @buran - thank you once again for taking the time sharing your ideas!

I managed to restructure my code and it works fine but my next question is how can I make the output like a table as shown in the screenshot here?

[Image: 070818_zpshfys00vk.png]

==============================================================================

file = open("alice_in_wonderland.txt", "r", errors ='ignore') # open file
charcount = {} #dictionary to hold char counts
validchars = "abcdefghijklmnopqrstuvwxyz" # only these counted

print(": Letter : Frequency :")

for i in range(97,123): # lowercase range
    c = (chr(i)) # the chars a-z
    charcount[c] = 0 # initialize count

for line in file:
    words = line.split(" ") # line into words
    for word in words:  # words into chars
      chars = list(word) #convert word into a char list
      for c in chars:  # process chars
          if c.isalpha():  # only alpha allowd
              if c.isupper():
                  c = c.lower()  # if char is upper convert to lower
              if c in validchars: # if in valid char set
                  charcount[c] += 1 # increment count
                  
print(charcount) # print list

file.close() # close file
Thank you in advance!
Blockchain Visionary & Aspiring Encipher/Software Developer
me = {'Python Learner' : 'Beginner\'s Level'}
http://bit.ly/JoinMeOnYouTube
Reply
#5
Look at the format method of strings. You can set up a template string that specifies the format for each row: how wide the columns are, what characters go between the columns, how to justify the text in the columns. Then you can loop through the data you collected, and send the character and the count to the format method of the template string, and print the result.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#6
Look at string formatting.
New from 3.6 is f-string as i use here.
print(f'{":":<2} {"Letter":<2} {":":<2} {"Frequency":<5} {":":<2}')
for k,v in charcount.items():
    print(f'{":":<2} {k:^6} {":":<4} {v:<6} {":":>2}')
Output:
: Letter : Frequency : : a : 8794 : : b : 1475 : : c : 2400 : : d : 4934 : : e : 13579 : : f : 2001 : : g : 2531 :
There also 3-party library like eg python-tabulate.

Edit:It become some simelar info as posted concurrently with @ichabod801
Reply
#7
(Aug-07-2018, 10:48 PM)ichabod801 Wrote: Look at the format method of strings. You can set up a template string that specifies the format for each row: how wide the columns are, what characters go between the columns, how to justify the text in the columns. Then you can loop through the data you collected, and send the character and the count to the format method of the template string, and print the result.

Thank you very much, @ichabod801 for the input and the Format String Syntax link. Smile
It was very helpful and it goes very well with Formatted string literals from @snippsat!

(Aug-07-2018, 11:17 PM)snippsat Wrote: Look at string formatting.
New from 3.6 is f-string as i use here.
print(f'{":":<2} {"Letter":<2} {":":<2} {"Frequency":<5} {":":<2}')
for k,v in charcount.items():
    print(f'{":":<2} {k:^6} {":":<4} {v:<6} {":":>2}')
Output:
: Letter : Frequency : : a : 8794 : : b : 1475 : : c : 2400 : : d : 4934 : : e : 13579 : : f : 2001 : : g : 2531 :
There also 3-party library like eg python-tabulate.

Edit:It become some simelar info as posted concurrently with @ichabod801

OMG that you so much for sharing the code, @snippsat! It worked and my table is straight as well. Smile

..and for sharing the Formatted string literals document I will definitely review it!

#MY OLD CODE [when I can't seem to make the table straight] lol

file = open("alice_in_wonderland.txt", "r", errors ='ignore') # open file
charcount = {} #dictionary to hold char counts
validchars = "abcdefghijklmnopqrstuvwxyz" # only these counted

print(": Letter : Frequency :")

for i in range(97,123): # lowercase range
    c = (chr(i)) # the chars a-z
    charcount[c] = 0 # initialize count

for line in file:
    words = line.split(" ") # line into words
    for word in words:  # words into chars
      chars = list(word) #convert word into a char list
      for c in chars:  # process chars
          if c.isalpha():  # only alpha allowd
              if c.isupper():
                  c = c.lower()  # if char is upper convert to lower
              if c in validchars: # if in valid char set
                  charcount[c] += 1 # increment count
                  
for key, value in sorted(charcount.items()):
   print(':   ', key, '  :  ', value, '   :')

file.close() # close file
==============================================================================

#MY NEW CODE => Massive thanks to @snippsat! :)

file = open("alice_in_wonderland.txt", "r", errors ='ignore') # open file
charcount = {} #dictionary to hold char counts
validchars = "abcdefghijklmnopqrstuvwxyz" # only these counted

print(f'{":":<2} {"Letter":<2} {":":<2} {"Frequency":<5} {":":<2}')

for i in range(97,123): # lowercase range
    c = (chr(i)) # the chars a-z
    charcount[c] = 0 # initialize count

for line in file:
    words = line.split(" ") # line into words
    for word in words:  # words into chars
      chars = list(word) #convert word into a char list
      for c in chars:  # process chars
          if c.isalpha():  # only alpha allowd
              if c.isupper():
                  c = c.lower()  # if char is upper convert to lower
              if c in validchars: # if in valid char set
                  charcount[c] += 1 # increment count
                  
for k,v in charcount.items():
    print(f'{":":<2} {k:^6} {":":<4} {v:<6} {":":>2}')

file.close() # close file
[Image: 00_zpsrbn655xl.png]
Blockchain Visionary & Aspiring Encipher/Software Developer
me = {'Python Learner' : 'Beginner\'s Level'}
http://bit.ly/JoinMeOnYouTube
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Thumbs Up Need to compare the Excel file name with a directory text file. veeran1991 1 1,061 Dec-15-2022, 04:32 PM
Last Post: Larz60+
  Row Count and coloumn count Yegor123 4 1,261 Oct-18-2022, 03:52 AM
Last Post: Yegor123
  Modify values in XML file by data from text file (without parsing) Paqqno 2 1,574 Apr-13-2022, 06:02 AM
Last Post: Paqqno
  Converted Pipe Delimited text file to CSV file atomxkai 4 6,839 Feb-11-2022, 12:38 AM
Last Post: atomxkai
  all i want to do is count the lines in each file Skaperen 13 4,727 May-23-2021, 11:24 PM
Last Post: Skaperen
  [split] How to convert the CSV text file into a txt file Pinto94 5 3,247 Dec-23-2020, 08:04 AM
Last Post: ndc85430
  How to use the count function from an Excel file using Python? jpy 2 4,358 Dec-21-2020, 12:30 AM
Last Post: jpy
  Saving text file with a click: valueerror i/o operation on closed file vizier87 5 4,325 Nov-16-2020, 07:56 AM
Last Post: Gribouillis
  get two characters, count and print from a .txt file Pleiades 9 3,289 Oct-05-2020, 09:22 AM
Last Post: perfringo
  saving data from text file to CSV file in python having delimiter as space K11 1 2,353 Sep-11-2020, 06:28 AM
Last Post: bowlofred

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020