Mar-22-2019, 02:01 PM
Another solution which supports deep nested iterables.
You can also pipe one generator into the next.
The code above allows also Integers, if you want to filter them, you can make a pipeline.
from collections import deque def extract_words(mixture): """ Flat deep nested iterables and split strings if they occour. """ stack = deque([mixture]) # using a stack to allow iterating # deep nested list # it could be done easier with recursion # but all stack based languages have a recursion limit to_split = (str, bytes) # we want to split str and bytes while stack: # loop runs until stack is empty current = stack.popleft() # with the first iteration # stack is currently empty # and current has the first element from stack if isinstance(current, to_split): # split if the current object is a str or bytes yield from current.split() else: # this branch is executed, if the current object # is not a str or bytes try: current = iter(current) # iter of iter returns the same iterator subelement = next(current) # the next does what the for loop does except StopIteration: # but we have to check for errors manually pass except TypeError: # and if an element is not iterable, it raieses # TypeError. Intgers are for example are not # iterable yield subelement else: # if no error happens, put the current iterator back # to the left side of the stack stack.appendleft(current) # put the subelement of the beginning of the deque stack.appendleft(subelement)This is a generator. If you call it, you have to iterate over it.
data = ['Andre Müller', ['hallo', '123'], [[[[['foo bar']]], 'bat']], 12] extractor = extract_words(data) # nothing happens, now you can decide if you want to have a list, set, tuple or dict or something different. result = list(extractor) # now the generator extractor is exhausted, you can't take it again # but you can make a new generator and use it for example in a for loop for element in extract_words(data): print(element)The befit is, that the caller decides which object type is used to put the elements inside.
You can also pipe one generator into the next.
The code above allows also Integers, if you want to filter them, you can make a pipeline.
def filter_non_iterable(iterable): allowed = (tuple, list, set, str, bytes) for element in iterable: if isinstance(element, allowed): yield element filtered = filter_non_iterable(data) result = extract_words(filtered) # now still nothing happens print(list(result)) # now the first generator is consumed by the second generator. The first one filters, the second one does the flatten and split task.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!