Python Forum

Full Version: splitting a string numeically
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
i am looking for something that can split a string at a specific interval rather than by content

a = '0123456789abcdefghijklmnopqrstuvwxyz'
b = skapsplit( a, 4 )
c = skapsplit( a, 6 )
d = skapsplit( a, 18 )
would get:
b = [ '0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz' ]
c = [ '012345', '6789ab', 'cdefgh', 'ijklmn', 'opqrst', 'uvwxyz' ]
d = [ '0123456789abcdefgh', 'ijklmnopqrstuvwxyz' ]
should i continue looking or start coding?
Sorry for not being more creative, but I found an answer (hopefully) here:
http://stackoverflow.com/questions/18854...k-with-the
I too was curious is there is already a function available to peform this task and wasn't lucky =)
I found this:

http://stackoverflow.com/questions/94752...-character

There is some zip() magic there which I don't understand :)
As shown in the link in @j.crater's post, the standard chunking idiom is:
>>> sequence
'abcdefghijklmnopqrstuvwxyz'
>>> chunk = 5
>>> [sequence[i:i+chunk] for i in range(0, len(sequence), chunk)]
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']
>>>
Preferably wrap it in a function and return a generator:
def chunk_split(sequence, chunk):
    return (sequence[i:i+chunk] for i in range(0, len(sequence), chunk))
my pyutils.py module just grew.
(Oct-07-2016, 07:00 AM)wavic Wrote: [ -> ]I found this:

http://stackoverflow.com/questions/94752...-character

There is some zip() magic there which I don't understand :)

From SO, in slow motion

>>> s = '1234567890'
>>> [iter(s)]*2
[<iterator object at 0x03581AB0>, <iterator object at 0x03581AB0>]
>>> zip(*[iter(s)]*2)
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8'), ('9', '0')]
Simply the zip consume elements from the same iterator object.
The grouper function can be found in the itertools recipes of the Python help file along with other useful recipes.
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)
(Oct-07-2016, 08:19 AM)buran Wrote: [ -> ]
(Oct-07-2016, 07:00 AM)wavic Wrote: [ -> ]I found this:

http://stackoverflow.com/questions/94752...-character

There is some zip() magic there which I don't understand :)

From SO, in slow motion

>>> s = '1234567890'
>>> [iter(s)]*2
[<iterator object at 0x03581AB0>, <iterator object at 0x03581AB0>]
>>> zip(*[iter(s)]*2)
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8'), ('9', '0')]
Simply the zip consume elements from the same iterator object.

Oh, I get it finally. [<iterator object at 0x03581AB0>, <iterator object at 0x03581AB0>] opened my eyes. I did't know that I can create list of objects like that. ::D
Does [obj]*n works with any class or function.
So, in this case, the zip() would be...
>>> a = '0123456789abcdefghijklmnopqrstuvwxyz'
>>> [''.join(chunk) for chunk in zip(*[iter(a)]*4)]

['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz']
Which works in this specific case, because the string is evenly divisible by 4... if we add a couple more items...
>>> a = '0123456789abcdefghijklmnopqrstuvwxyz01'
>>> [''.join(chunk) for chunk in zip(*[iter(a)]*4)]
['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz']
...the last two are ignored.  So if you want evenly sized chunks, zip would work.  If you just want all the string, but in sizes no greater than 4, zip won't do the job.  The previously mentioned range()/slice solution works fine, though (and is 100% easier to read).
>>> chunk = 4
>>> [a[i:i+chunk] for i in range(0, len(a), chunk)]
['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz', '23']

Or, if you like regular expressions, this is also a pretty readable way to do it: 
>>> import re
>>> a
'0123456789abcdefghijklmnopqrstuvwxyz23'
>>> re.findall('.{1,4}', a)
['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz', '23']
This is a good post
Pages: 1 2