Python Forum

Pages: 1 2

Hello,

I am trying to shorten large strings by applying some rules to it but not being able to figure it out.

Here is the code I have tried so far w/o success.
The string in endString is my end goal using the startString sample.
The rules are added as comments below.

Thank you in advance for your help

startString = "Masonry - Concrete Block (Small) - Misc Air Layer - Insulation - Aluminium"
endString = "Msnr-CncrBlck(Smll)-MiscAirLyr-Insltn-Almnm"

#rules
#words => 4 chars are left untouched
#words > 4 chars have all lowercase vowels removed
#special chars are kept, such as parenthesis and hyphens

words = startString.split()

for index in range(len(words)):
    if len(words[index]) > 4:
        vowels = ('a', 'e', 'i', 'o', 'u')
        for x in words[index]:
            if x in vowels:
                shorter = words[index].replace(x, "")

endStringByCode = "".join(shorter)
print (endStringByCode)

Don't you mean "<= 4" in line 5?

The problem is that you are doing

shorter = words[index].replace(x, "")

So let's look at what happens

Let words[index] be the string "I don't want any lower-case vowels".
The first time through the loop starting on line 15, your first match is the "o" in "don't". So shorter is set to
"I dn't want any lower-case vowels"
The next time there is a match it matches the a in want, so shorter is set to
"I don't wnt any lower-case vowels"
Do you see the problem?

What you want to do is always test shorter, not words[index]. Which is a bit tricky, because you have removed the letter, so you have to manage the range. I'm not going to write the code for your homework, but this is as much help as I will give.

No rule about spaces but still they are removed. Why?

Also:

Masonry -> Msnr but Layer -> Lyr. Why in first case y is removed but in second not? Both words are longer than 4 chars.

Concrete -> Cncr. Why t is removed?

words => 4 chars are left untouched should probably be <= otherwise rules don't make sense.

OK, I am trusting this is not homework, which means I may be duped.
There are a number of issues. See this code, which gives what you want:

startString = "Masonry - Concrete Block (Small) - Misc Air Layer - Insulation - Aluminium"
endString = "Msnr-CncrBlck(Smll)-MiscAirLyr-Insltn-Almnm"
 
#rules
#words => 4 chars are left untouched
#words > 4 chars have all lowercase vowels removed
#special chars are kept, such as parenthesis and hyphens
 
words = startString.split()
vowels = ('a', 'e', 'i', 'o', 'u', 'y')
out_string = ''

for word in words:
    if len(word) >= 4:
        shorter = ''
        for x in word:
            if x not in vowels:
                shorter = shorter + x
    else:
        shorter = word
        
    out_string = out_string + shorter

print (out_string)

Any time you have range(len(... there is a better way. In this case, for word in words. You don't have to mess with index at all. I moved the definition of vowels outside the loop, as doing that every time is inefficient. Rather than removing characters from the start string (btw - CamelCsae is frowned upon, better to use underscore like in out_string) I build the shorter string from acceptable characters, if the length is long enough. I added 'y' as a vowel as in your example it is to be removed. Perfringo is right, by using split you lose the spaces, but I assume that is ok?

I'm going to put my version as well.

newwords = []
vowels = ('a', 'e', 'i', 'o', 'u')
for word in words:
    if len(word) > 4:
        for letter in word:
            if letter in vowels:
                word = word.replace(letter, '')
        newwords.append(word)
    else:
        newwords.append(word)
print(' '.join(newwords))

Output:
Msnry - Cncrt Blck (Smll) - Misc Air Lyr - Insltn - Almnm

(May-01-2021, 08:27 PM)jefsummers Wrote: [ -> ]See this code, which gives what you want

As conditions are ambiguous it's hard to say what is wanted output. For example: should (oye) -> (oye) or (oye) -> ()? Your code does latter but I personally think that former should be correct (three letter word in parentheses i.e 'untouched word').

By gives you what you want I meant that the output matches his example.

I am concerned about some other examples such as menator's that modify the item being iterated upon.

I do not understand. I modified the item being iterated? I'm still learning. Could you explain please? Thanks.

Menator - in line 5 you are iterating over word. In line 7 you modify word. So, let's say you are on the 6th letter of an 8 letter word, and you eliminate that letter. Python then moves to the 7th letter, which was the 8th letter, skipping the 7th.

(May-01-2021, 08:27 PM)jefsummers Wrote: [ -> ]OK, I am trusting this is not homework, which means I may be duped.
There are a number of issues. See this code, which gives what you want:
startString = "Masonry - Concrete Block (Small) - Misc Air Layer - Insulation - Aluminium"
endString = "Msnr-CncrBlck(Smll)-MiscAirLyr-Insltn-Almnm"
 
#rules
#words => 4 chars are left untouched
#words > 4 chars have all lowercase vowels removed
#special chars are kept, such as parenthesis and hyphens
 
words = startString.split()
vowels = ('a', 'e', 'i', 'o', 'u', 'y')
out_string = ''

for word in words:
    if len(word) >= 4:
        shorter = ''
        for x in word:
            if x not in vowels:
                shorter = shorter + x
    else:
        shorter = word
        
    out_string = out_string + shorter

print (out_string)
Any time you have range(len(... there is a better way. In this case, for word in words. You don't have to mess with index at all. I moved the definition of vowels outside the loop, as doing that every time is inefficient. Rather than removing characters from the start string (btw - CamelCsae is frowned upon, better to use underscore like in out_string) I build the shorter string from acceptable characters, if the length is long enough. I added 'y' as a vowel as in your example it is to be removed. Perfringo is right, by using split you lose the spaces, but I assume that is ok?

I like this approach and I can understand your explanation.
Also, thank you for your advice on best practice.
it is also ok to loose the spaces as my goal here is to get the whole lenght of the string shorter while still keeping it human readable.

Thank you very much for your help :)

Pages: 1 2

ambrozote

supuflounder

perfringo

jefsummers

menator01

perfringo

jefsummers

menator01

jefsummers

ambrozote