So I have a list
L
. To do some parallel processing, I want to split it into
N
sublists of equivalent length. The contents of the sublists are indifferent. What is your shortest and/or most pythonic code? Can you avoid using
len(L)
explicitly?
M = [L[start::N] for start in range(N)]
Assumes N divides evenly into len(L). If that's a concern you could
put a second index into the slice double zip it to account for that.
So you want to chunk?
Lots of good answers here:
http://stackoverflow.com/questions/31244...zed-chunks
The standard idiom people generally use is:
chunks = (L[i:i + N] for i in range(0, len(L), N))
This assumes the whole sequence fits in memory which it sounds like you are trying to avoid.
Alternatives use iter and zip and look like this:
chunks_zip = zip(*[iter(L)]*N)
The second version above ignores groups that are not full. But can handle generators that wouldn't otherwise fit in memory:
Python3
L = range(10**25)
N = 3
chunks_zip = zip(*[iter(L)]*N)
for i in range(10):
print(next(chunks_zip))
Output:
(0, 1, 2)
(3, 4, 5)
(6, 7, 8)
(9, 10, 11)
(12, 13, 14)
(15, 16, 17)
(18, 19, 20)
(21, 22, 23)
(24, 25, 26)
(27, 28, 29)
Use zip_longest with a fill value if you don't want to lose incomplete groups.
(Jan-12-2017, 01:26 AM)ichabod801 Wrote: [ -> ]M = [L[start::N] for start in range(N)]
Assumes N divides evenly into len(L). If that's a concern you could put a second index into the slice double zip it to account for that.
This is more or less what my code does, but with slightly clumsier Python. And as far as I can tell this works even if N is doesn't divide len(N).
(Jan-12-2017, 02:41 AM)Mekire Wrote: [ -> ]So you want to chunk?
Lots of good answers here:
http://stackoverflow.com/questions/31244...zed-chunks
The standard idiom people generally use is:
chunks = (L[i:i + N] for i in range(0, len(L), N))
This assumes the whole sequence fits in memory which it sounds like you are trying to avoid.
No, in the real-life problem I am handling rather small lists, it just that each item ends up used in a webservice call so to speed things up I create
N
threads and give each a part of the original list.
(Jan-12-2017, 02:41 AM)Mekire Wrote: [ -> ]Alternatives use iter and zip and look like this:
chunks_zip = zip(*[iter(L)]*N)
The second version above ignores groups that are not full. But can handle generators that wouldn't otherwise fit in memory:
Python3
L = range(10**25)
N = 3
chunks_zip = zip(*[iter(L)]*N)
for i in range(10):
print(next(chunks_zip))
Output:
(0, 1, 2)
(3, 4, 5)
(6, 7, 8)
(9, 10, 11)
(12, 13, 14)
(15, 16, 17)
(18, 19, 20)
(21, 22, 23)
(24, 25, 26)
(27, 28, 29)
Use zip_longest with a fill value if you don't want to lose incomplete groups.
Clever, but if you are unlucky, one of the chunks produced by the idioms above can have a length of 1 so the maximum size difference is
N-1
. With the
1 every N sampling you get more uniform sizes (but possibly using more CPU/memory, which isn't really a concern for me but could be for someone else).
There is an example in the Python documentation:
https://docs.python.org/3.5/library/iter...ls-recipes
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
(Jan-12-2017, 09:13 AM)wavic Wrote: [ -> ]There is an example in the Python documentation: https://docs.python.org/3.5/library/iter...ls-recipes
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
Yes, this is Mekire's suggestion...
(Jan-12-2017, 08:10 AM)Ofnuts Wrote: [ -> ]This is more or less what my code does, but with slightly clumsier Python. And as far as I can tell this works even if N is doesn't divide len(N).
You said you wanted lists of equivalent length. I took that to mean equal length. If N does not divide evenly into len(L), then some of the lists will be one item longer. The fixes I proposed would make them all the same length, but they would drop some items.
How about
this
It looks simple enough for everyone.