Python Forum

Full Version: generator function that yield from a list
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Let's say we have a string lie this 'a b c' (simplified example)
so we can do
spam = 'a b c'
for ch in spam.split(' '):
    print(ch)


in this case str.split() will produce a list (i.e. it will generate the whole list in memory)

for same result we can do also

def chars(eggs):
   for ch in eggs.split(' '):
       yield ch

spam = 'a b c'
for ch in chars(spam):
    print(ch)
in this case the generator function chars will not produce the list from eggs.split() in memory, right? it will be evaluated lazy? I started to doubt myself today answering on SO question... Blush
This is not answer, but in corner case (if you don't mind spaces / there are no spaces) one can yield directly:

>>> def chars(word):
...     yield from word
...
>>> for char in chars('abc'):
...     print(char)
...
a
b
c
>>> for char in chars('a b c'):
...     if char != ' ':
...         print(char)
a
b
c
As I said this was simplified example. Here is link to my answer on SO. Note also my comment under it.

If it was simply str, I would iterate directly over it (i.e. it's in the memory anyway, generally no benefit to create a generator)
I think re.finditer() will work for this.
>>> import re
>>> 
>>> spam = 'a b c'
>>> for match in re.finditer(r'\S+', spam):
...     print(match.group())
...     
a
b
c
re.finditer() is an True iterator,and will not store values in memory.
Should work for more complicated cases as can write regex pattern for a lot stuff.
So next() and __next__ before values get used.
>>> r = re.finditer(r'a', 'a')
>>> r
<callable_iterator object at 0x04C5FFB0>
>>> next(r)
<re.Match object; span=(0, 1), match='a'>


>>> r = re.finditer(r'a', 'a')
>>> r.__next__()
<re.Match object; span=(0, 1), match='a'>
@snippsat, thanks, but my question is more or less theoretical (please check also the SO)
basically, I ask I ask if we have (pseudocode)
def spam():
    for egg in <SOME LIST/TUPLE OBJECT HERE, e.g. returned by some function or method like str.split()>:
       yield egg
does python evaluate the list/tuple when creating the generator function I.e. create the whole list in memory or it is evaluated lazy only when yield next value. I think it's the later
Only the yield will be lazy the str.split() will return a full list.
(Jun-04-2019, 08:57 PM)Yoriz Wrote: [ -> ]Only the yield will be lazy the str.split() will return a full list.
so, if that is the case, it doesn't make sense to create the generator function in this particular case
This would be lazy, no extra list created.
def chars(eggs):
    for ch in eggs:
        if ch != " ":
            yield ch


spam = "a b c"
for ch in chars(spam):
    print(ch)
Output:
a b c
this was my actual case/answer on SO:

def get_addresses(input_string):
    for address in input_string.split(' BEG ')[-1].split(' END ')[0].split(' '):
        yield address

foo = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
for idx, address in enumerate(get_addresses(foo)):
    print(f'[{idx}]0x{address}')
they wanted alternative to using regex to extract address values between BEG and END and format them in a particular way.

and the user asked if there is performance benefit in using generator function compared to directly iterate over foo.split(' BEG ')[-1].split(' END ')[0].split(' '). My comment was that
Quote:in this particular case (assuming you will not have many addresses) there is no practical difference. In general case split() will produce list in memory, while get_addresses is generator and it will not produce the whole list in the memory. In addition it makes the code more structured and allows to test the generator function separately.
Then I had second thoughts and asked here... I should have posted the actual code from the start...
:-) Anyway, thanks a lot
Here with re.finditer(),so it's still regex but an alternative way of using regex.
addresses = btInfo.group().split()
for idx in range(len(addresses)):
So here use @r0ng split() and range(len(addresses),together with regex.

import re

foo = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
for match in re.finditer(r'BEG\s(.*?)\s(.*?)\s', foo):
    for idx,address in enumerate(iter(match.groups())):
        print(f'[{idx}]0x{address}')
Output:
[0]0x701D135D [1]0x702D72FC