Python Forum
Can someone help me optimize this game for large number of strings - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Can someone help me optimize this game for large number of strings (/thread-27878.html)

Pages: 1 2


Can someone help me optimize this game for large number of strings - Emekadavid - Jun-25-2020

I was given an assignment to create a game that takes a string as input and then evaluate the substrings starting with a consonant and the substrings starting with a vowel and print out which side, the vowel side or the consonant side, has a higher count. If a substring within a consonant or vowel repeats, the score for that substring is increased by one. Like if within a string like 'banana' substrings starting with consonants are b, ba, bana, banan, n, n, na, nan, etc, you can see that n and na repeats so they have scores of 2 while the rest have scores of 1. I will total the scores on the consonent side and compare it with the scores on the vowel side and print out the winner or if there is a draw. Please, can someone show me how to optimize this code to accept long strings. That is where the code fails. It gives runtime error to me or hangs my ide.
def subsConsonant(text, startswith):
    ''' text - the string that we used as input 
        startswith - the string of vowels 
        ---
        returns a list of the substrings in the text that starts
             with a consonant 
    '''
    dlist = [text[i:j+1] for i in range(len(text)) for j in range(i, len(text)) if text[i] not in startswith]
    return dlist
    
def subCount(aList):
    ''' aList - a list 
    ---
    creates a dictionary from the list, the keys as the unique items in
    the list and the values as the count of each of the items
    ---
    returns the sum of the count of all the keys in the dictionary 
    '''
    countDict = {}
    for i in range(len(aList)):
        if aList[i] in countDict:
            countDict[aList[i]] += 1
        else :
            countDict[aList[i]] = 1
    theSum = sum(value for value in countDict.values())
    return theSum

def subsVowels(text, startswith):
    ''' text - string, the input string
        startswith - the string of vowels
    ---
    if a letter in text is a vowel, it takes all the substrings 
    of that letter in the text and appends it to a list. 
    ---
    returns the list of substrings starting with a vowel 
    '''
    dlist = [text[i:j+1] for i in range(len(text))for j in range(i, len(text)) if text[i] in startswith]
    return dlist

def gameDecision(stuartScore, kevinScore):
    '''
    stuartScore - the score attributed to Stuart in the game 
    kevinScore - the score attributed to Kevin in the game 
    ---
    evaluates which of the scores is higher and prints out the 
    higher score, otherwise it is a draw and it prints this out. 
    ---
    return None
    '''
    if stuartScore > kevinScore : 
        print('Stuart', stuartScore)
    elif kevinScore > stuartScore :
        print('Kevin', kevinScore)
    else :
        print('Draw')
        
def minion_game(string):
    # your code goes here
    assert 0<len(string)<=1000000
    vowels = 'AEIOU'
    stuartScore = subCount(subsConsonant(string, vowels))
    kevinScore = subCount(subsVowels(string, vowels))
    gameDecision(stuartScore, kevinScore)

if __name__ == '__main__':
    s = input()
    minion_game(s)
If you give it a string like 'BANANA' it will output
Output:
'Stuart 12',
that is Stuart is the winner who is the consonant side with a score of 12. I have attached a txt file with a string of very long length that did not succeed with the game. If someone can try out the txt file and show me how to make it succeed, I will be grateful. The text file is named testcase.txt.


RE: Can someone help me optimize this game for large number of strings - GOTO10 - Jun-26-2020

Your program requires a lot of overhead because you are converting your string into lists of substrings unnecessarily, and also creating a dictionary that you don't need. Keep in mind that you don't need to store the values you are counting, you just need to count them and store that value. Also, don't forget that a string behaves like a list, in that you can iterate through it and use indexes with it.

With your current code, try your "BANANA" test case and have your subsConsonant() and subsVowels() functions print the length of the list they are returning. Notice anything about the length of the list returned by subsConsonant() and how it correlates to the player's scores?

I can think of one other hint to share, but it might make the solution too obvious. Let me know if you are still stuck.


RE: Can someone help me optimize this game for large number of strings - Emekadavid - Jun-26-2020

I will try out counting directly and see how it goes. Thanks for the idea.

(Jun-26-2020, 05:54 PM)GOTO10 Wrote: I can think of one other hint to share, but it might make the solution too obvious. Let me know if you are still stuck.
I am still stuck. I followed your advice and did only the counting, removing the lists and dictionaries. I am pasting my code below after following your advice. Now, the code fails on only three test cases out of 14. It says that the reason for the failure was because the code did not execute within the time limit. Can we optimize it further? I am really excited about doing this because I am learning new tricks and if you can help me, I will be delighted. Here is my eventual code:
def count_cons(text, vowels):
    countVowels = 0
    countConsonants = 0
    for i in range(len(text)):
        if text[i] not in vowels :
            for j in range(i, len(text)):
                countConsonants += 1    
        else :
            if text[i] in vowels :
                for j in range(i, len(text)):
                    countVowels += 1
    if countConsonants > countVowels :
        print('Stuart', countConsonants)
    elif countVowels > countConsonants :
        print('Kevin', countVowels)
    else :
        print('Draw')                                                

def minion_game(string):
    # your code goes here
    assert 0<len(string)<=1000000
    vowels = 'AEIOU'
    count_cons(string, vowels)
    
if __name__ == '__main__':
    s = input()
    minion_game(s)



RE: Can someone help me optimize this game for large number of strings - deanhystad - Jun-27-2020

This is a really expensive way to do very simple math
            for j in range(i, len(text)):
                countConsonants += 1    
This is a logic problem, not a programming problem. Step away from the computer and think about how to solve the problem, not how you would write a program to solve the problem.


RE: Can someone help me optimize this game for large number of strings - Emekadavid - Jun-27-2020

I don't think it is a logic problem. I have modeled that loop several ways. The loop is counting substrings in string that is why it was designed that way. If you can help me I would be grateful. I have been on that problem since Monday. I need help that's why I came here. If you can tell me what I am not seeing in counting the substrings I would be grateful. The programming solution is okay but I was told that it is not efficient. That is the problem.


RE: Can someone help me optimize this game for large number of strings - Emekadavid - Jun-27-2020

(Jun-27-2020, 04:25 PM)deanhystad Wrote: This is a really expensive way to do very simple math
            for j in range(i, len(text)):
                countConsonants += 1    
This is a logic problem, not a programming problem. Step away from the computer and think about how to solve the problem, not how you would write a program to solve the problem.
I decided to look into the model/logic based on what you said and removed the loop. Look at the new code. But the test cases still say that it did not execute within the time limit. Please, if you can help just help me.
def count_cons(text, vowels):
    countVowels = 0
    countConsonants = 0
    textlength = len(text)
    for i in range(textlength):
        if text[i] not in vowels :
            countConsonants += len(text[i:])
        else :
            if text[i] in vowels :
                countVowels += len(text[i:])
    if countConsonants > countVowels :
        print('Stuart', countConsonants)
    elif countVowels > countConsonants :
        print('Kevin', countVowels)
    else :
        print('Draw')                                                

def minion_game(string):
    # your code goes here
    assert 0<len(string)<=1000000
    vowels = 'AEIOU'
    count_cons(string, vowels)
if __name__ == '__main__':
    s = input()
    minion_game(s)
If anyone can give me the concept I am missing in order to optimize the code to run within a very short time, I will research it and try it out. That is why I came here.


RE: Can someone help me optimize this game for large number of strings - deanhystad - Jun-27-2020

Why do you test if the letter is in vowels and then test if it is not in vowels?


RE: Can someone help me optimize this game for large number of strings - GOTO10 - Jun-27-2020

You are making progress, but the logic required for this problem is actually much simpler than you realize (and you are not alone, it is easy to overthink this one). Where it gets confusing is when you think of things like "N appears twice so it gets 2 points". You don't really need to parse it out that way, you just need to get the total number of substrings that start with consonants and the total that start with vowels. As long as you are getting the correct total count, you don't need to be concerned with how many times a given substring exists.

From your test case using "BANANA", you know that Stuart's score (count of substrings beginning with a consonant) is 12. Note that there are 6 substrings that begin with B (B, BA, BAN, BANA, BANAN, BANANA), 4 substrings that start with the first N (N, NA, NAN, NANA), and 2 that start with the final N (N, NA). That adds up to 12.


RE: Can someone help me optimize this game for large number of strings - Emekadavid - Jun-27-2020

(Jun-27-2020, 07:56 PM)GOTO10 Wrote: From your test case using "BANANA", you know that Stuart's score (count of substrings beginning with a consonant) is 12. Note that there are 6 substrings that begin with B (B, BA, BAN, BANA, BANAN, BANANA), 4 substrings that start with the first N (N, NA, NAN, NANA), and 2 that start with the final N (N, NA). That adds up to 12.
My code is okay for the banana test case. Where it fails is where the string input is very long, like 100,000 characters. I want something that would scale to a string that is unknown of such a length. I followed your recommendation and what you described above is what I have done. See the last code above. I have removed the inner loop and counted the strings directly. But still, it doesn't scale to high string lengths.

(Jun-27-2020, 07:28 PM)deanhystad Wrote: Why do you test if the letter is in vowels and then test if it is not in vowels?
That is the essence of the game. Given a string, count the number of substrings that start with a vowel, then count the number of substrings that start with a consonant. Then, print out the higher score, and if a draw, print out that it is the same score. My code scales for medium number of string lengths, but for high number of string lengths, like lengths of 100,000 characters, it fails to run to completion within the set time. I am looking for how to make it scale to such high string lengths.


RE: Can someone help me optimize this game for large number of strings - GOTO10 - Jun-27-2020

(Jun-27-2020, 08:43 PM)Emekadavid Wrote:
(Jun-27-2020, 07:28 PM)deanhystad Wrote: Why do you test if the letter is in vowels and then test if it is not in vowels?
That is the essence of the game. Given a string, count the number of substrings that start with a vowel, then count the number of substrings that start with a consonant. Then, print out the higher score, and if a draw, print out that it is the same score. My code scales for medium number of string lengths, but for high number of string lengths, like lengths of 100,000 characters, it fails to run to completion within the set time. I am looking for how to make it scale to such high string lengths.

Think about how you would determine if an unknown letter was a consonant or a vowel outside of a programming context. If someone told you they were thinking of a letter and it wasn't a vowel, would you then ask them if it was a consonant or would you already have enough information to determine that?

(Jun-27-2020, 08:43 PM)Emekadavid Wrote: My code is okay for the banana test case. Where it fails is where the string input is very long, like 100,000 characters. I want something that would scale to a string that is unknown of such a length. I followed your recommendation and what you described above is what I have done. See the last code above. I have removed the inner loop and counted the strings directly. But still, it doesn't scale to high string lengths.

I may have been looking at the wrong version of your code when I made my previous comment. Honestly, your code seems pretty close to optimal at this point to me. On my PC, it returns the result for your text file test case almost instantly. I modified your code to use what I *think* is a slightly simpler method of getting the substring count (instead of len(text[i:])), and I also handled vowels and consonants differently per my comment above, but it doesn't make much of a difference in discernible runtime. I even tried using a string about 10x as long as your test case and didn't see a lot of difference. Maybe it would be more appreciable on a slower machine, or maybe my code is no better optimized than yours. Big Grin

How are you being timed when you execute your test case? Are you looking at an average of execution times or just a single execution? Using something like the time module to calculate these tiny differences is not terrifically accurate.