Python Forum

Full Version: average word length
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Trying to write a simple program that calculates the average length of words used in a sentence. My issue is that the spaces are taken into an account when counting characters, and that gives higher number for average. The replace("","") function had to eliminate spaces form the input, but it does not seem to work. Any ideas? Thank you.

def main():
    sentence = input("Enter text: ")
    words = len(sentence.split())
    chars = len(sentence.replace("",""))
    avg = chars / words
    print("Your average word length is:", round(avg))
main()
I would guess that you intended
chars = len(sentence.replace("",""))
to be
chars = len(sentence.replace(" ",""))
(Note the extra character in the second line.)

Alternatively, you can take the sum of all the individual words produced by sentence.split(), to get the number of characters.
In order to get accurate results you should also take care of punctuation marks. If entered text is 'Yes? No! Yes? No!' then average lenght is +1 compared to real average lenght of words.

One way to deal with it:

# eliminating whitespaces, tab, and other non-printable symbols
chars = ''.join(sentence.split()) 
# list of symbols you don't want to count as characters  
nonchars = [".", "!", "?", ",", ":", ";", "-", "'"] 
letters = len([char for char in chars if char not in nonchars])
I think perfringo has the right idea, but that wouldn't capture double-quotes ("). I would suggest a whitelist rather than a blacklist
from string import letters
# [...]
letter_count = len(char for char in chars if char in letters)
Thank you everyone, your feedback did not only resolve my issue, but additionally instructed the concepts to me I previously had not known.

P.S. Regarding my post, I shall make the appropriate adjustments to my future posts. (It was my first time posting on this forum.)
(Jul-17-2018, 04:32 PM)micseydel Wrote: [ -> ]I think perfringo has the right idea, but that wouldn't capture double-quotes ("). I would suggest a whitelist rather than a blacklist
from string import letters
# [...]
letter_count = len(char for char in chars if char in letters)

Whitelisting is definitely way to go! It is much better to allow specific set of letters instead of trying to guess what clever symbols users might enter.

It seems to me, that there is no 'letters' in string.py. I get ImportError: cannot import 'letters' from 'string'. Shouldn't it be:
from string import ascii_letters?
(Jul-18-2018, 06:14 AM)perfringo Wrote: [ -> ]Shouldn't it be
You're right
Output:
$ python Python 2.7.15 (default, May 1 2018, 16:44:08) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from string import letters >>> letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> $ python3 Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 03:03:55) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from string import letters Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name 'letters' >>> from string import ascii_letters >>>
Thanks for the catch :)