Posts: 8,151
Threads: 160
Joined: Sep 2016
Jun-04-2019, 02:00 PM
(This post was last modified: Jun-04-2019, 02:00 PM by buran.)
Let's say we have a string lie this 'a b c' (simplified example)
so we can do
spam = 'a b c'
for ch in spam.split(' '):
print(ch)
in this case str.split() will produce a list (i.e. it will generate the whole list in memory)
for same result we can do also
def chars(eggs):
for ch in eggs.split(' '):
yield ch
spam = 'a b c'
for ch in chars(spam):
print(ch) in this case the generator function chars will not produce the list from eggs.split() in memory, right? it will be evaluated lazy? I started to doubt myself today answering on SO question...
Posts: 1,950
Threads: 8
Joined: Jun 2018
This is not answer, but in corner case (if you don't mind spaces / there are no spaces) one can yield directly:
>>> def chars(word):
... yield from word
...
>>> for char in chars('abc'):
... print(char)
...
a
b
c
>>> for char in chars('a b c'):
... if char != ' ':
... print(char)
a
b
c
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 8,151
Threads: 160
Joined: Sep 2016
As I said this was simplified example. Here is link to my answer on SO. Note also my comment under it.
If it was simply str, I would iterate directly over it (i.e. it's in the memory anyway, generally no benefit to create a generator)
Posts: 7,312
Threads: 123
Joined: Sep 2016
I think re.finditer() will work for this.
>>> import re
>>>
>>> spam = 'a b c'
>>> for match in re.finditer(r'\S+', spam):
... print(match.group())
...
a
b
c re.finditer() is an True iterator,and will not store values in memory.
Should work for more complicated cases as can write regex pattern for a lot stuff.
So next() and __next__ before values get used.
>>> r = re.finditer(r'a', 'a')
>>> r
<callable_iterator object at 0x04C5FFB0>
>>> next(r)
<re.Match object; span=(0, 1), match='a'>
>>> r = re.finditer(r'a', 'a')
>>> r.__next__()
<re.Match object; span=(0, 1), match='a'>
Posts: 8,151
Threads: 160
Joined: Sep 2016
Jun-04-2019, 08:54 PM
(This post was last modified: Jun-04-2019, 08:57 PM by buran.)
@ snippsat, thanks, but my question is more or less theoretical (please check also the SO)
basically, I ask I ask if we have (pseudocode)
def spam():
for egg in <SOME LIST/TUPLE OBJECT HERE, e.g. returned by some function or method like str.split()>:
yield egg does python evaluate the list/tuple when creating the generator function I.e. create the whole list in memory or it is evaluated lazy only when yield next value. I think it's the later
Posts: 2,168
Threads: 35
Joined: Sep 2016
Only the yield will be lazy the str.split() will return a full list.
Posts: 8,151
Threads: 160
Joined: Sep 2016
(Jun-04-2019, 08:57 PM)Yoriz Wrote: Only the yield will be lazy the str.split() will return a full list. so, if that is the case, it doesn't make sense to create the generator function in this particular case
Posts: 2,168
Threads: 35
Joined: Sep 2016
This would be lazy, no extra list created.
def chars(eggs):
for ch in eggs:
if ch != " ":
yield ch
spam = "a b c"
for ch in chars(spam):
print(ch) Output: a
b
c
Posts: 8,151
Threads: 160
Joined: Sep 2016
Jun-04-2019, 09:18 PM
(This post was last modified: Jun-04-2019, 09:18 PM by buran.)
this was my actual case/answer on SO:
def get_addresses(input_string):
for address in input_string.split(' BEG ')[-1].split(' END ')[0].split(' '):
yield address
foo = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
for idx, address in enumerate(get_addresses(foo)):
print(f'[{idx}]0x{address}') they wanted alternative to using regex to extract address values between BEG and END and format them in a particular way.
and the user asked if there is performance benefit in using generator function compared to directly iterate over foo.split(' BEG ')[-1].split(' END ')[0].split(' ') . My comment was that Quote:in this particular case (assuming you will not have many addresses) there is no practical difference. In general case split() will produce list in memory, while get_addresses is generator and it will not produce the whole list in the memory. In addition it makes the code more structured and allows to test the generator function separately.
Then I had second thoughts and asked here... I should have posted the actual code from the start...
:-) Anyway, thanks a lot
Posts: 7,312
Threads: 123
Joined: Sep 2016
Jun-04-2019, 10:26 PM
(This post was last modified: Jun-04-2019, 10:26 PM by snippsat.)
Here with re.finditer(),so it's still regex but an alternative way of using regex.
addresses = btInfo.group().split()
for idx in range(len(addresses)): So here use @r0ng split() and range(len(addresses) ,together with regex.
import re
foo = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
for match in re.finditer(r'BEG\s(.*?)\s(.*?)\s', foo):
for idx,address in enumerate(iter(match.groups())):
print(f'[{idx}]0x{address}') Output: [0]0x701D135D
[1]0x702D72FC
|