Python Forum
Counting words in the last line of a file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Counting words in the last line of a file
#11
Since my last forum post, I have taken my script to the next level. Rather than just analyzing a basic 6 line text file, I present the user with choices of three very long text files (full length books). I managed to dynamically adjust the range depending on the length of each book. I’ve also added doc strings to each of my functions.

I am really pleased with how my script is cooking so far. See the bottom of this post for my script in full.

Although there are still some issues I am trying to handle.

Near the end of the script, the user is prompted to replay using the again() function. When the user enters “C” (or anything other than A, B, or C), then the script should exit. That’s what I’m expecting. But instead (like when I enter “aaa” or whatever as input), the validate_and_choose() function is triggered. Could someone lend a hand with an explanation?

Another issue with the script is at line 57 where I attempt to re-declare the chosen_book variable. There my linter says that chosen_book is an unused variable (even though it was already declared and used earlier in the script). I figure this could be because the variable is declared within again() and is therefore out of scope, meaning it is isolated within that particular function and has no access to variables declared or used elsewhere in the script. Is this correct? If this variable is out of scope as I suspect, then how do I get this working properly so that the book() function is triggered (as I have set out to do)?

I’m working with three text files (public domain books). The three books can be found here on my Dropbox. Tolstoy’s book is some 66054 lines long. I chose this because because it is so large in order to put my Python interpreter to work. It’s 3.4 MBs so it’s too big to attach here on this Python forum message board.

Here is my latest iteration of my script:
# This is my ninth iteration of this text reading script. 

def book():
    ''' This function presents the user with a selection of 3 potential books to examine '''
    options = {'1':'Tolstoy.txt','2':'Alice.txt','3':'Chesterton.txt'}
    print("Choose from this list of books: \n 1. Tolstoy \n 2. Alice \n 3. Chesterton")
    pick = input("What is your pick? 1.? 2.? or 3.?")
    selection = options.get(pick)  
    print(f"You picked: {selection[:-4]}!") # for testing
    return selection
    
def showcase():
    ''' This function essentially prints the entire book, line by line (but also prints the associated line numbers '''
    with open(chosen_book, 'r') as lines:
        lines2 = lines.readlines()
        for num, line in enumerate(lines2):
            print(num, line.rstrip())
        return lines2

def how_long(lines2):
    ''' This function counts the number of lines (for future use) '''
    length = len(lines2)  
    return length

def validate_and_choose(request, allowed_range):
    ''' This function ensures the user input is an integer and within the range of number of lines '''
    range_text = f'integer in range {min(allowed_range)} - {max(allowed_range)}'
    while True:
        answer = input(f'{request} (enter {range_text}) ')
        try:
            answer = int(answer)
            if answer in allowed_range:
                return answer        
            if answer not in allowed_range:
                raise ValueError  
        except ValueError:
            print(f'Expected {range_text} but input was "{answer}". Try again! ')

def main(choice):   
    ''' This function opens, reads each line and then processes the user's choice and presents the final user's output '''  
    with open(chosen_book, 'r') as lines:
        picked_line = lines.readlines()[choice]
        picked_line = picked_line.rstrip()
        result_num = picked_line.split()
        length_of_line = len(result_num)
        num_chars = len("".join(picked_line))
        print(f'Here is the line that you picked:\n  "{picked_line}" \nThe number of words: {length_of_line}\nThe number of characters: {num_chars}')

def again():
    ''' This function gives the user the ability to (1) start from the very beginning at the top, (2) restart half way through  or (3) exit '''
    replay = str(input("\nWould you like to choose a new line in the same book? Or would you prefer to pick a line from a different book? Enter:\n'A' for the same book, \n'B' for a different book, or \n'C' to exit this program \n Make your selection: "))
    if replay == "A" or "a":
        final = validate_and_choose('\nWhich line do you want to count and print? ', range(0,length))
        main(final)
        again()
    elif replay == "B" or "b":
        chosen_book = book()
        again()
        pass
    elif replay == "C" or "c":
        print("Goodbye!")
        exit
    else:
        print("I'll take that answer as a request to exit this program. Goodbye for now!")
        pass

if __name__ == "__main__":
    chosen_book = book()
    num_lines = showcase()
    length = how_long(num_lines)
    final = validate_and_choose('Which line do you want to count and print? ', range(0,length))
    main(final)
    again()
Reply
#12
Maybe as an inspiration for you.
Don't look to long on this bytes_in_memory function.
It does not show the real size, just how many characters.
A character can have one byte or more, if it's encoded with
utf8. This is default in Python.


#!/usr/bin/env python3

"""
Description

# This is my ninth iteration of this text reading script.
"""

import sys


BOOKS = {1: 'Tolstoy.txt', 2: 'Alice.txt', 3: 'Chesterton.txt'}


def read_all_books():
    """
    Read all books from global variable BOOKS
    The keys are the digits
    """
    result = {}
    for number, filename in BOOKS.items():
        with open(filename) as fd:
            book_lines = fd.read().splitlines()
        result[number] = book_lines
        # maybe adding some metadata to the book
    return result


def which_book():
    """
    This function presents the user with a selection
    of 3 potential books to examine.
    """
    print("\nChoose from this list of books: \n 1. Tolstoy \n 2. Alice \n 3. Chesterton")
    while True:
        try:
            pick = int(input("What is your pick (1, 2, 3)? "))
        except ValueError:
            print(f'{pick} is not in the list. Enter a valid number in the range of available books.')
        return pick


def showcase(book):
    """
    This function essentially prints the entire book,
    line by line (but also prints the associated line numbers
    """
    max_num = len(str(len(book))) + 1 # i know it's silly
    # just want to know how long the last linenumber is
    for num, line in enumerate(book):
        print(f'{num:>{max_num}}: {line.rstrip()}')
    # we don't return anything
    # the books are already loaded

 
def validate_and_choose(allowed_range):
    """
    This function ensures the user input is an integer
    and within the range of number of lines.
    """
    request = '\nWhich line do you want to count and print?\n'
    range_text = f'integer in range {min(allowed_range)} - {max(allowed_range)}'
    while True:
        answer = input(f'Enter {range_text}: ')
        try:
            answer = int(answer)
            if answer in allowed_range:
                return answer        
            if answer not in allowed_range:
                raise ValueError  
        except ValueError:
            print(f'Expected {range_text} but input was "{answer}". Try again! ')

 
def again():
    """
    This function gives the user the ability to
    
    (1) start from the very beginning at the top,
    (2) restart half way through
    (3) exit
    """
    replay = input(
        "\nWould you like to choose a new line in the same book?\n" 
        "Or would you prefer to pick a line from a different book?\n"
        "\n'A' for the same book, \n'B' for a different book, "
        "or \n'C' to exit this program \n Make your selection: ")
    return replay.lower()


def main_menu():
    """
    Main menu + Loop for this game
    """
    print('Welcome to my game.')
    print('Maybe some help..')
    print(f'All books together take {utf8_in_memory / 1024**2:.2f} MiB in memory')
    # it's not right. It consumes more memory because of
    # the overhead of the dict itself and the list as holder for
    # the lines of the books
    print()
    book_key = which_book()
    # just a number, which is the key of the dict book_data
    while True:
        book = book_data[book_key]
        print()
        showcase(book)
        print()
        line_index = validate_and_choose(range(0, len(book)))
        words = len(book[line_index].split())
        # characters = len(book[line_index])
        characters = len(book[line_index].replace(' ', ''))
        # only characters, no whitespaces
        print(f'Here is the line that you picked: "{line_index}"\n'
              f'The number of words: {words}\n'
              f'The number of characters: {characters}')
        replay = again()
        if replay == 'a':
            continue
        elif replay == 'b':
            book_key = which_book()
        elif replay == 'c':
            print("Goodbye!")
            return 0
        else:
            print(
                "I'll take that answer as a request"
                "to exit this program. Goodbye for now!"
                )
            return 1
 
 
if __name__ == "__main__":
    book_data = read_all_books()
    # book_data on module leve.
    # it could be in main()
    # but then you must pass this around
    utf8_in_memory = sum(sys.getsizeof(line) for book in book_data.values() for line in book)
    try:
        retval = main_menu()
    except KeyboardInterrupt:
        retval = 10
        print('\nGoodbye!')
    sys.exit(retval) # maybe as information for other shell citizens

I found additional resources how to determine the real size in memory:

Just look over it, if you're interested. If the size of the books is lesser than 500 MiB, then you don't have to think about it very much. It's still on a Raspbery PI 3 possible. But if you want to continue with your program and add more books, many books, you need a different method. One way could be a program which just parses all books and saves the metadata to a file (json, pickle, csv..).
Then you don't have to load the whole books. Then the size is reduced to the number of lines of all books with maybe two or three integers (character-count of the line, word count of the line). Just as an idea. Other people would prefer a database. If you want to use persistent storage sqlite3 is also a good option.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#13
Thank you, @DeaD_EyE! Seeing an alternate solution like this is enormously helpful. You clearly spent a lot of time and put alotta effort into re-writing my whole project. Big Grin

I understand most of your script, line-by-line. However you are right that the utf8_in_memory variable with the list comprehension technique doesn't quite come naturally to me yet.

I'll return here soon with some more questions. Thanks again, my friend!! Cool
Reply
#14
@DeaD_EyE, your script runs very well.

I’m trying to extend your awesome script in a small way. In the main_menu function in your while loop as is, it prints the client’s selection as an integer (your line 114). Right after that line, I’m now trying to tell Python to print the content of the client’s selection. For example, if the client chooses ‘2988’ in Alice.txt then I’d like the script to print that specific line in the text file: “You did!' said the Mad Hatter”.

To make this happen, here are the changes I’ve made so far.

To that for loop, I added: f'Here is the content of the line that you picked: \n"{request}"\n'. To the validate_and_choose function I added an input function and globalized this new variable:

   global request
   request = input('\nWhich line do you want to count and print? >>  \n')
I decided to globalize the request variable so I can use it later in the script in a different function. I also initiated the request variable at the top of your script with request = None. It runs without any syntax error but I’m not achieving my desired output because right now it just prints the integer value as a string: “2988”. I figure I need to grab the enumerated line iterable which corresponds to the num iterable, similar to the enumerated loop earlier in the script. But I am at a loss here.

How do I tell Python to print the content of the line selected by the user in the text file of their choice?

Here is your script with my changes:

#!/usr/bin/env python3
# 
# # This is my tenth iteration of this text reading script. 

import sys
 

BOOKS = {1: 'Tolstoy.txt', 2: 'Alice.txt', 3: 'Chesterton.txt'}
request = None


def read_all_books():
    """
    Read all books from global variable BOOKS
    The keys are the digits
    """
    result = {}
    for number, filename in BOOKS.items():
        with open(filename) as fd:
            book_lines = fd.read().splitlines()
        result[number] = book_lines
        # maybe adding some metadata to the book
    return result
 
 
def which_book():
    """
    This function presents the user with a selection
    of 3 potential books to examine.
    """
    print("\nChoose from this list of books: \n 1. Tolstoy \n 2. Alice \n 3. Chesterton")
    while True:
        try:
            pick = int(input("What is your pick (1, 2, 3)? "))
        except ValueError:
            print(f'{pick} is not in the list. Enter a valid number in the range of available books.')
        return pick
 
 
def showcase(book):
    """
    This function essentially prints the entire book,
    line by line (but also prints the associated line numbers)
    """
    max_num = len(str(len(book))) + 1 # i know it's silly
    # just want to know how long the last linenumber is
    for num, line in enumerate(book):
        print(f'{num:>{max_num}}: {line.rstrip()}')
    # we don't return anything
    # the books are already loaded
 
  
def validate_and_choose(allowed_range):
    """
    This function ensures the user input is an integer
    and within the range of number of lines.
    """
    global request
    request = input('\nWhich line do you want to count and print? >>  \n')    
    range_text = f'integer in range {min(allowed_range)} - {max(allowed_range)}'
    while True:
        answer = input(f'Enter {range_text}: ')
        try:
            answer = int(answer)
            if answer in allowed_range:
                return answer        
            if answer not in allowed_range:
                raise ValueError  
        except ValueError:
            print(f'Expected {range_text} but input was "{answer}". Try again! ')
    
  
def again():
    """
    This function gives the user the ability to
     
    (1) start from the very beginning at the top,
    (2) restart half way through
    (3) exit
    """
    replay = input(
        "\n------------------------------------------------------\n" 
        "\nWould you like to choose a new line in the same book?\n" 
        "Or would you prefer to pick a line from a different book?\n"
        "\n'A' for the same book, \n'B' for a different book, "
        "or \n'C' to exit this program \n Make your selection: ")
    return replay.lower()
 
 
def main_menu():
    """
    Main menu + Loop for this game
    """
    print('Welcome to my game.')
    print('Maybe some help..')
    print(f'All books together take {utf8_in_memory / 1024**2:.2f} MiB in memory')
    # it's not right. It consumes more memory because of
    # the overhead of the dict itself and the list as holder for
    # the lines of the books
    print()
    book_key = which_book()
    # just a number, which is the key of the dict book_data
    while True:
        book = book_data[book_key]
        print()
        showcase(book)
        print()
        line_index = validate_and_choose(range(0, len(book)))
        words = len(book[line_index].split())
        # characters = len(book[line_index])
        characters = len(book[line_index].replace(' ', ''))
        # only characters, no whitespaces
        print(f'Here is the line # that you picked: "{line_index}"\n'
              f'Here is the content of the line that you picked: \n"{request}"\n'
              f'The number of words: {words}\n'
              f'The number of characters: {characters}')
        replay = again()
        if replay == 'a':
            continue
        elif replay == 'b':
            book_key = which_book()
        elif replay == 'c':
            print("Goodbye!")
            return 0
        else:
            print(
                "I'll take that answer as a request"
                "to exit this program. Goodbye for now!"
                )
            return 1
  
  
if __name__ == "__main__":
    book_data = read_all_books()
    # book_data on module leve.
    # it could be in main()
    # but then you must pass this around
    utf8_in_memory = sum(sys.getsizeof(line) for book in book_data.values() for line in book)
    try:
        retval = main_menu()
    except KeyboardInterrupt:
        retval = 10
        print('\nGoodbye!')
    sys.exit(retval) # maybe as information for other shell citizens
Reply
#15
for num, line in enumerate(book):
    print(f'{num:>{max_num}}: {line.rstrip()}')
This for loop uses the enumerate function to print each line of the file along with the corresponding line numbers. What perplexes me is the first variable inside the f-string. Would someone kindly explain, {num:>{max_num}}? What does the greater than sign indicate in this context? In my humble incorrect misunderstanding, this is a boolean where if num is greater max_num, only then num will print. This is obviously wrong, because my explanation is inconsistent with the output. I’m way off. Could someone please clarify?
Reply
#16
(May-31-2019, 10:18 AM)Drone4four Wrote: Would someone kindly explain, {num:>{max_num}}?
It's string formatting,< ^ > aligned left middle and right.
>>> left = '<left aligned>'
>>> middle = '<stay in middle>'
>>> print(f'aaa {left:>20} bbb {middle:^30} ccc')
aaa       <left aligned> bbb        <stay in middle>        ccc


>>> print(f"Sammy has {4:4} red and {16:16}! blue balloons")
Sammy has    4 red and               16! blue balloons

 
>>> for word in 'f-strings are awesome'.split():
...     print(f'{word.upper():~^20}')
...     
~~~~~F-STRINGS~~~~~~
~~~~~~~~ARE~~~~~~~~~
~~~~~~AWESOME~~~~~~~
 
>>> for word in 'f-strings are awesome'.split():
...     print(f'{word.upper():~<20}')
...     
F-STRINGS~~~~~~~~~~~
ARE~~~~~~~~~~~~~~~~~
AWESOME~~~~~~~~~~~~~

>>> for word in 'f-strings are awesome'.split():
...     print(f'{word.upper():~>20}')
...     
~~~~~~~~~~~F-STRINGS
~~~~~~~~~~~~~~~~~ARE
~~~~~~~~~~~~~AWESOME
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  File "<string>", line 19, in <module> error is related to what? Frankduc 9 12,539 Mar-09-2023, 07:22 AM
Last Post: LocklearSusan
  Getting last line of each line occurrence in a file tester_V 1 857 Jan-31-2023, 09:29 PM
Last Post: deanhystad
  Need to match two words in a line tester_V 2 868 Nov-18-2022, 03:13 AM
Last Post: tester_V
  Writing string to file results in one character per line RB76SFJPsJJDu3bMnwYM 4 1,364 Sep-27-2022, 01:38 PM
Last Post: buran
  Print to a New Line when Appending File DaveG 0 1,216 Mar-30-2022, 04:14 AM
Last Post: DaveG
  Find and delete above a certain line in text file cubangt 12 3,452 Mar-18-2022, 07:49 PM
Last Post: snippsat
  CSV to Text File and write a line in newline atomxkai 4 2,677 Feb-15-2022, 08:06 PM
Last Post: atomxkai
  writelines only writes one line to file gr3yali3n 2 2,370 Dec-05-2021, 10:02 PM
Last Post: gr3yali3n
  Extract a string between 2 words from a text file OscarBoots 2 1,866 Nov-02-2021, 08:50 AM
Last Post: ibreeden
  Generate a string of words for multiple lists of words in txt files in order. AnicraftPlayz 2 2,790 Aug-11-2021, 03:45 PM
Last Post: jamesaarr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020