Python Forum

Full Version: Code golfing: splitting a list
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
So I have a list L. To do some parallel processing, I want to split it into N sublists of equivalent length. The contents of the sublists are indifferent. What is your shortest and/or most pythonic code? Can you avoid using len(L) explicitly? 

python python python

Assumes N divides evenly into len(L). If that's a concern you could put a second index into the slice double zip it to account for that.
So you want to chunk?
Lots of good answers here:
http://stackoverflow.com/questions/31244...zed-chunks

The standard idiom people generally use is:
chunks = (L[i:i + N] for i in range(0, len(L), N))
This assumes the whole sequence fits in memory which it sounds like you are trying to avoid.

Alternatives use iter and zip and look like this:
chunks_zip = zip(*[iter(L)]*N)
The second version above ignores groups that are not full.  But can handle generators that wouldn't otherwise fit in memory:
Python3
L = range(10**25)
N = 3

chunks_zip = zip(*[iter(L)]*N)

for i in range(10):
    print(next(chunks_zip))
Output:
(0, 1, 2) (3, 4, 5) (6, 7, 8) (9, 10, 11) (12, 13, 14) (15, 16, 17) (18, 19, 20) (21, 22, 23) (24, 25, 26) (27, 28, 29)
Use zip_longest with a fill value if you don't want to lose incomplete groups.
(Jan-12-2017, 01:26 AM)ichabod801 Wrote: [ -> ]

Assumes N divides evenly into len(L). If that's a concern you could put a second index into the slice double zip it to account for that.

This is more or less what my code does, but with slightly clumsier Python. And as far as I can tell this works even if N is doesn't divide len(N).

(Jan-12-2017, 02:41 AM)Mekire Wrote: [ -> ]So you want to chunk?
Lots of good answers here:
http://stackoverflow.com/questions/31244...zed-chunks

The standard idiom people generally use is:
chunks = (L[i:i + N] for i in range(0, len(L), N))
This assumes the whole sequence fits in memory which it sounds like you are trying to avoid.
No, in the real-life problem I am handling rather small lists, it just that each item ends up used in a webservice call so to speed things up I create N threads and give each a part of the original list.

(Jan-12-2017, 02:41 AM)Mekire Wrote: [ -> ]Alternatives use iter and zip and look like this:
chunks_zip = zip(*[iter(L)]*N)
The second version above ignores groups that are not full.  But can handle generators that wouldn't otherwise fit in memory:
Python3
L = range(10**25)
N = 3

chunks_zip = zip(*[iter(L)]*N)

for i in range(10):
    print(next(chunks_zip))
Output:
(0, 1, 2) (3, 4, 5) (6, 7, 8) (9, 10, 11) (12, 13, 14) (15, 16, 17) (18, 19, 20) (21, 22, 23) (24, 25, 26) (27, 28, 29)
Use zip_longest with a fill value if you don't want to lose incomplete groups.
Clever, but if you are unlucky, one of the chunks produced by the idioms above can have a length of 1 so the maximum size difference is N-1. With the 1 every N sampling you get more uniform sizes (but possibly using more CPU/memory, which isn't really a concern for me but could be for someone else).
There is an example in the Python documentation: https://docs.python.org/3.5/library/iter...ls-recipes

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)
(Jan-12-2017, 09:13 AM)wavic Wrote: [ -> ]There is an example in the Python documentation: https://docs.python.org/3.5/library/iter...ls-recipes

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Yes, this is Mekire's suggestion...
(Jan-12-2017, 08:10 AM)Ofnuts Wrote: [ -> ]This is more or less what my code does, but with slightly clumsier Python. And as far as I can tell this works even if N is doesn't divide len(N).

You said you wanted lists of equivalent length. I took that to mean equal length. If N does not divide evenly into len(L), then some of the lists will be one item longer. The fixes I proposed would make them all the same length, but they would drop some items.
How about this

It looks simple enough for everyone.