Python Forum

Full Version: "Slicing and dicing strings" - - PyBite #105
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I’m learning how to ‘slice and dice’ strings. Here is the code challenge:

Quote:https://codechalleng.es/bites/105/

Take the block of text provided and strip off the whitespace at both ends. Split the text by newline (\n).

Loop through the lines, for each line:

- strip off any leading spaces,
- check if the first character is lowercase,
- if so, split the line into words and get the last word,
- strip the trailing dot (.) and exclamation mark (!) from this last word,
- and finally add it to the results list.

Return the results list.

Here is my code on the first attempt:

from string import ascii_lowercase
 
text = """
One really nice feature of Python is polymorphism: using the same operation
on different types of objects.
Let's talk about an elegant feature: slicing.
You can use this on a string as well as a list for example
'pybites'[0:2] gives 'py'.
The first value is inclusive and the last one is exclusive so
here we grab indexes 0 and 1, the letter p and y.
 When you have a 0 index you can leave it out so can write this as 'pybites'[:2]
but here is the kicker: you can use this on a list too!
['pybites', 'teaches', 'you', 'Python'][-2:] would gives ['you', 'Python']
and now you know about slicing from the end as well :)
keep enjoying our bites!
"""
results = []
stripped = text.strip()
splitted = stripped.split("\n")
# naive debug:
print(f"First debug:{splitted}")
for line in splitted:
   # strip off any leading spaces:
   line.lstrip()
   line.rstrip()
   # naive debug:
   print(f"Second debug: {line}")
   # check if the first character is lowercase:
   if line[0].islower():
       # split the line into words and get the last word:
       new_line_split = line.split()
       last_word = new_line_split[-1]
       # naive debug:
       print(f"Third debug: {last_word}")
       # strip the trailing dot (.) and exclamation mark (!) from this last word:
       new_line_stripped = new_line_split.strip(".", "!")
       results = new_line_stripped.split()
I’ve done my best to use annotations to explain my rationale for using the class methods and operators that I do and why.

When I run the script, I get a traceback pointing to the last line indicating an “AttributeError: ‘list’
object has no attribute 'strip'”. Since strings are ‘immutable’, they can’t be stripped in their place but new_line_split is not a string, it’s a list which I created a few lines previous. Why am I getting this traceback? My own (incorrect) answer to this question is that I am getting this AttributeError because I’m trying to modify a string which can’t be modified. But it’s not a string as far as I can tell - - I am deliberately manipulating a list and assigning it to the variable: new_line_stripped. But this understanding here is flawed. Could someone please clarify further?

That is my main question. I need a few hints from you people to overcome this traceback, without you solving it completely for me.

Also, if you people have any further comments or hints on how to improve my script in other areas, that would be appreciated too.

Here is the full traceback (including print debugging output for further hints at what is wrong with my script):

Quote:$ python slicing_basic.py
First debug:['One really nice feature of Python is polymorphism: using the same operation', 'on different types of objects.', "Let's talk about an elegant feature: slicing.", 'You can use this on a string as well as a list for example', "'pybites'[0:2] gives 'py'.", ' The first value is inclusive and the last one is exclusive so', 'here we grab indexes 0 and 1, the letter p and y.', " When you have a 0 index you can leave it out so can write this as 'pybites'[:2]", 'but here is the kicker: you can use this on a list too!', "['pybites', 'teaches', 'you', 'Python'][-2:] would gives ['you', 'Python']", 'and now you know about slicing from the end as well :)', 'keep enjoying our bites!']
Second debug: One really nice feature of Python is polymorphism: using the same operation
Second debug: on different types of objects.
Third debug: objects.
Traceback (most recent call last):
File "slicing_basic.py", line 36, in <module>
new_line_stripped = new_line_split.strip(".", "!")
AttributeError: 'list' object has no attribute 'strip'

Thank you.
When you split something, you make a list.
The error messge says that a list has no attribute strip.
So...

Paul
this is illegal: strip(".", "!")
to do two in one line, use: strip(".").strip("!")
In addition to what was previously said. Note that you are aware that
new_line_split
is a List type, are you sure that lists have the strip method? Also
Quote:- strip the trailing dot (.) and exclamation mark (!) from this last word,
key here is the last word, If I want to know the last word of the new_line how would I do so. This is a list after all. documentation on string methods. also look for documentation on List methods, where does strip come from.

In addition id like to mention, I often spend time playing in the python repl and like to make discoveries of what methods are available to me for certain types, sometimes I forget or because I deal with multiple languages, I need to refresh my memory, I will use things like dir(str) or dir(dict) ... on various types
@DPaul + @knackwurstbagel: Thanks for your advice. You are both right. I was previously handling a list which cannot be stripped. I’m now working with the .join() method. Only a string can be stripped. I’ve made a little bit more progress. As a refresher, here is what I am trying to do:

Quote:- strip off any leading spaces,
- check if the first character is lowercase,
- if so, split the line into words and get the last word,
- strip the trailing dot (.) and exclamation mark (!) from this last word,
- and finally add it to the results list.

I’m close. I’ve accomplished the first and second objective. But I am still not stripping the trailing dots (.) properly and I’m not appending or extending the results list properly either. Here is my latest attempt that I am working with now:

from string import ascii_lowercase
 
text = """
One really nice feature of Python is polymorphism: using the same operation
on different types of objects.
Let's talk about an elegant feature: slicing.
You can use this on a string as well as a list for example
'pybites'[0:2] gives 'py'.
The first value is inclusive and the last one is exclusive so
here we grab indexes 0 and 1, the letter p and y.
 When you have a 0 index you can leave it out so can write this as 'pybites'[:2]
but here is the kicker: you can use this on a list too!
['pybites', 'teaches', 'you', 'Python'][-2:] would gives ['you', 'Python']
and now you know about slicing from the end as well :)
keep enjoying our bites!
"""
results = []
stripped = text.strip()
splitted = stripped.split("\n")
# naive debug:
# print(f"First debug:{splitted}")
for line in splitted:
   # strip off any leading spaces:
   line.lstrip(' ')
   line.rstrip(' ')
   # naive debug:
   # print(f"Second debug: {line}")
   # check if the first character is lowercase:
   if line[0].islower():
       # split the line into words and get the last word:
       new_line_split = line.split()
       # naive debug:
       # print(f"Third debug: {new_line_split}")
       last_word = new_line_split[-1]
       # naive debug:
       # print(f"Fourth debug: {last_word}")
       results.extend(last_word)
       results_as_string = ''.join(results)
results_as_string.strip("!").strip(".")  # .split("!")
print(results_as_string)
There is no longer a traceback now. Here is my output: objects.y.too!:)bites!

According to the exercise, the expected out should simply be: ['objects', 'y', 'too', ':)', 'bites']

I am so close! Almost there. In my effort to get even closer to the desired end result, I've replaced the last two lines with these three instead:

results_as_string.rstrip("!")
splitted_result = results_as_string.split('.')
print(splitted_result)
With these three lines, here is my output: ['objects', 'y', 'too!:)bites!']

I’m even closer now! But I'm still not quite where I want to be.

Can someone provide some further hints?

Thanks @Larz60+ for the tip. I’ve replaced the instance of strip(".", "!") with strip(".").strip("!"). I experimented with using a similar form but instead of using strip, I used split(".").split("!"). That didn't work very well.
I see that you have extracted the last word, what happens if you last_word.strip("!") on this. The other thing is I happen to be a member of PyBites and solved that one long ago, I did it a completely different way but I noticed in the problem description
Quote:- strip the trailing dot (.) and exclamation mark (!) from this last word,
that and should be or and have notified the PyBites team.

I would like to note that python documentation on str.strip methodsays
Quote:Return a copy of the string the leading and trailing characters removed.
Let us think about a situation where the line contains both:

results_as_string = 'Sed a tincidunt nisl Mauris!.'
results_as_string.strip('!').strip('.')
What does this really mean. Well if we are to chain the methods, it first try to take '!' off the end of results_as_string but the last character is a '.' so what's does it return? It returns the same string back, unmodified. Now the second call to strip takes the result of the last strip and then looks for a '.', It finds one and removes it. You would get back 'Sed a tincidunt nisl Mauris!' So the dot was removed but ! was ignored. But looking at the example in the python documentation, it implies that you could provide a string with the characters to be removed like so:

results_as_string = 'Sed a tincidunt nisl Mauris!.'
results_as_string.strip('!.')
So now we understand, for every character you want to strip, put them all in a string. Ah ha! Ok but now why are we adding the last_word to results before striping '!' or '.' from the ending. Lets try something like this:

results = []
stripped = text.strip()
splitted = stripped.split("\n")
# naive debug:
# print(f"First debug:{splitted}")
for line in splitted:
   # strip off any leading spaces:
   line.lstrip(' ')
   line.rstrip(' ')
   # naive debug:
   # print(f"Second debug: {line}")
   # check if the first character is lowercase:
   if line[0].islower():
       # split the line into words and get the last word:
       new_line_split = line.split()
       # naive debug:
       # print(f"Third debug: {new_line_split}")
       last_word = new_line_split[-1]
       last_word_striped = last_word.strip('!.')
       # naive debug:
       # print(f"Fourth debug: {last_word} {last_word_striped}")
       results.extend(last_word_striped)
       results_as_string = ''.join(results)

print(results_as_string)
Striping it before adding it to the results. Now I have not run this code so I do not know if it all works but it should or be pretty close to working. Let me know.
Hi @knackwurstbagel: Thank you for your feedback. I ran your code. Here is the output: objectsytoo:)bites. The ‘!.’ has been stripped, as you explained.

I’ve made more progress. I’ve achieved half the task, or so it appears to me.

This is my desired output: ['objects', 'y', 'too', ':)', 'bites']

This is my actual output: ['objects', 'y', 'too', ':)', 'bites']

You people are going to laugh when you see how inelegant my code turned out. Here is the script now, in full:

from string import ascii_lowercase

text = """
One really nice feature of Python is polymorphism: using the same operation
on different types of objects.
Let's talk about an elegant feature: slicing.
You can use this on a string as well as a list for example
'pybites'[0:2] gives 'py'.
The first value is inclusive and the last one is exclusive so
here we grab indexes 0 and 1, the letter p and y.
When you have a 0 index you can leave it out so can write this as 'pybites'[:2]
but here is the kicker: you can use this on a list too!
['pybites', 'teaches', 'you', 'Python'][-2:] would gives ['you', 'Python']
and now you know about slicing from the end as well :)
keep enjoying our bites!
"""
results = []
stripped = text.strip()
splitted = stripped.split("\n")
for line in splitted:
   line.lstrip(' ')
   line.rstrip(' ')
   if line[0].islower():
       new_line_split = line.split()
       last_word = new_line_split[-1]
       results.extend(last_word)
       results_as_string = ''.join(results)
print(results_as_string)
w = results_as_string.replace(".", "zzz")
print(w)
x = w.rstrip('!')
print(x)
y = x.replace("!", "zzz")
z = y[:-5] + "zzz" + y[-5:]
print(z)
a = z.split('zzz')
print(a)
I suppose it ‘works’ but it's ugly.

Here is my code refactored into a function:

from string import ascii_lowercase

def slice_and_dice(text):
    ''' 
    First task for this exercises
    '''
    results = []
    stripped = text.strip()
    splitted = stripped.split("\n")
    for line in splitted:
        line.lstrip(' ')
        line.rstrip(' ')
        if line[0].islower():
            new_line_split = line.split()
            last_word = new_line_split[-1]
            results.extend(last_word)
            results_as_string = ''.join(results)
    w = results_as_string.replace(".", "zzz")
    x = w.rstrip('!')
    y = x.replace("!", "zzz")
    z = y[:-5] + "zzz" + y[-5:]
    a = z.split('zzz')
    return a
Here is the test script:

from slicing_basic6 import text, slice_and_dice

another_text = """
Take the block of text provided and strip() off the whitespace at the ends.
Split the whole block up by newline (\n).
if the first character is lowercase, split it into words and add the last word
of that line to the results list.
Strip the trailing dot (.) and exclamation mark (!) from the word first.
 finally return the results list!
"""


def test_slice_and_dice_default_text():
   expected = ['objects', 'y', 'too', ':)', 'bites']
   assert slice_and_dice(text) == expected


def test_slice_and_dice_other_text():
   expected = ['word', 'list', 'list']
   assert slice_and_dice(another_text) == expected
When I run the test, here is the output:

 $ python -m pytest test_slicing.py
================================= test session starts ==================================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.8.1, pluggy-0.13.1
rootdir: /home/gnull/dev/projects/python/2018-and-2020/bitesofpy/Intro-freebies-101-110/inprogress/Bite 105 - Slice and dice
collected 2 items                                                                      

test_slicing.py .F                                                               [100%]

======================================= FAILURES =======================================
____________________________ test_slice_and_dice_other_text ____________________________

    def test_slice_and_dice_other_text():
        expected = ['word', 'list', 'list']
>       assert slice_and_dice(another_text) == expected
E       AssertionError: assert ['li', 'st', ''] == ['word', 'list', 'list']
E         At index 0 diff: 'li' != 'word'
E         Use -v to get the full diff

test_slicing.py:20: AssertionError
--------------------------------- Captured stdout call ---------------------------------
list.
listzzz
listzzz
lizzzstzzz
=============================== short test summary info ================================
FAILED test_slicing.py::test_slice_and_dice_other_text - AssertionError: assert ['li'...
============================= 1 failed, 1 passed in 0.05s ==============================
As you can see, it passes the first test. Hooray! But I have no clue how to re-write my code so that it automatically returns ['word', 'list', 'list'] based on the secondary text string declared in the test script.




Here is the solution provided by the course instructor:

from string import ascii_lowercase


text = """
One really nice feature of Python is polymorphism: using the same operation
on different types of objects.
Let's talk about an elegant feature: slicing.
You can use this on a string as well as a list for example
'pybites'[0:2] gives 'py'.
The first value is inclusive and the last one is exclusive so
here we grab indexes 0 and 1, the letter p and y.
When you have a 0 index you can leave it out so can write this as 'pybites'[:2]
but here is the kicker: you can use this on a list too!
['pybites', 'teaches', 'you', 'Python'][-2:] would gives ['you', 'Python']
  and now you know about slicing from the end as well :)
keep enjoying our bites!
"""

def slice_and_dice(text: str = text) -> list:
    """Get a list of words from the passed in text.
       See the Bite description for step by step instructions"""
    results = []
    for line in text.strip().splitlines():
        line = line.lstrip()

        if line[0] not in ascii_lowercase:
            continue

        words = line.split()
        last_word_stripped = words[-1].rstrip('!.')
        results.append(last_word_stripped)

    return results
I don’t completely understand. Here is my best attempt at transcribing the python code into english writing:

Quote:19 name function as slice_and_dice passin the text
20 doc string
21 doc string
22 declare results variable as empty list
23 split the lines and strip the text for every line
24 declare a variable as line with the left most character of the string and strip it ( but only if it is a space)
25 white space
26 if the first character of line is not a-z, then:
27 continue with the loop
28 white space
29 for the words variable, define it as every line but splitted
30 if the last word ends with ‘!.’, the strip it
31 append the last word to the results variable
32 white space
33 return the results

Would someone be able to add to or correct my understanding? Is there a better explanation for some of the lines I am struggling to understand?

At line 19, the function parameter is: text: str = text. How does this work? The variable text makes sense, but how and why does the string class method equal text? Also: What does -> list do?
(Jun-04-2020, 10:57 PM)knackwurstbagel Wrote: [ -> ]# naive debug:
# print(f"First debug:{splitted}")

From Python 3.8 for 'naive debug' f-string debugging feature can be used:

>>> line = 'This is line'
>>> print(f'{line=}')
line='This is line'

(Jun-11-2020, 02:49 PM)Drone4four Wrote: [ -> ]- check if the first character is lowercase,

Here is the solution provided by the course instructor:
from string import ascii_lowercase

As long as you live in a world where only 26 lowercase characters exist this code will work as expected. However, in a world which resembles reality there are much more lowercase letters than 26. I believe that .islower() is better choice for that world.
In the code I provided, replacing extend with append, and printing results seemed to do the trick. I forgot how extend and append works, that was my mistake. In your latest code the whole 'zzz' thing is confusing, I would just use a single strip to get rid of the trailing periods and exclamations. I was reading your transcript and I would like to suggest a few changes.

Quote:24 declare variable line as line with spaces trimmed
30 if last word ends with any combination of ! and . strip it

Why? for 30 In the docs "The chars argument is not a suffix; rather, all combinations of its values are striped"

This can be confusing, I devised a few experiments:

words = ['both!.', 'single.', 'reversed.!', 'exclamation!']
for word in words:
    print(word.rstrip('!.'))

# As comprehension
print([word.rstrip('!.') for word in words])
Can you guess the output before running this? In either case, if your 'zzz' method works that's great (I've not run that code) but it seems your failing the second test and, I would say that instead of
Quote:re-write my code so that it automatically returns ['word', 'list', 'list'] based on the secondary text string
I would try to debug your code with a debugger either one provided in vscode, PyCharm..., or the one python comes with pdb, I personally use pdb++. Its worth learning and saves a lot of print statements, as usually when your printing the value of variables, its because you suspect it does not contain what you think it does. In pdb you can set a breakpoint and then do display <variable_name> to track the value of that variable as it changes.

But in either case
Quote:At line 19, the function parameter is: text: str = text. How does this work? The variable text makes sense, but how and why does the string class method equal text? Also: What does -> list do?
is python typing

Python is dynamically typed and these are just hints to the user of what types the arguments are and in the case of -> list, that it returns a list. It is just a kind way of telling the programmer who is going to use your function that text is a str type and that the function returns a list. It can be safely removed.