Python Forum
splitting a string numeically
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
splitting a string numeically
#1
i am looking for something that can split a string at a specific interval rather than by content

a = '0123456789abcdefghijklmnopqrstuvwxyz'
b = skapsplit( a, 4 )
c = skapsplit( a, 6 )
d = skapsplit( a, 18 )
would get:
b = [ '0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz' ]
c = [ '012345', '6789ab', 'cdefgh', 'ijklmn', 'opqrst', 'uvwxyz' ]
d = [ '0123456789abcdefgh', 'ijklmnopqrstuvwxyz' ]
should i continue looking or start coding?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Sorry for not being more creative, but I found an answer (hopefully) here:
http://stackoverflow.com/questions/18854...k-with-the
I too was curious is there is already a function available to peform this task and wasn't lucky =)
Reply
#3
I found this:

http://stackoverflow.com/questions/94752...-character

There is some zip() magic there which I don't understand :)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#4
As shown in the link in @j.crater's post, the standard chunking idiom is:
>>> sequence
'abcdefghijklmnopqrstuvwxyz'
>>> chunk = 5
>>> [sequence[i:i+chunk] for i in range(0, len(sequence), chunk)]
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']
>>>
Preferably wrap it in a function and return a generator:
def chunk_split(sequence, chunk):
    return (sequence[i:i+chunk] for i in range(0, len(sequence), chunk))
Reply
#5
my pyutils.py module just grew.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
(Oct-07-2016, 07:00 AM)wavic Wrote: I found this:

http://stackoverflow.com/questions/94752...-character

There is some zip() magic there which I don't understand :)

From SO, in slow motion

>>> s = '1234567890'
>>> [iter(s)]*2
[<iterator object at 0x03581AB0>, <iterator object at 0x03581AB0>]
>>> zip(*[iter(s)]*2)
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8'), ('9', '0')]
Simply the zip consume elements from the same iterator object.
Reply
#7
The grouper function can be found in the itertools recipes of the Python help file along with other useful recipes.
def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)
Reply
#8
(Oct-07-2016, 08:19 AM)buran Wrote:
(Oct-07-2016, 07:00 AM)wavic Wrote: I found this:

http://stackoverflow.com/questions/94752...-character

There is some zip() magic there which I don't understand :)

From SO, in slow motion

>>> s = '1234567890'
>>> [iter(s)]*2
[<iterator object at 0x03581AB0>, <iterator object at 0x03581AB0>]
>>> zip(*[iter(s)]*2)
[('1', '2'), ('3', '4'), ('5', '6'), ('7', '8'), ('9', '0')]
Simply the zip consume elements from the same iterator object.

Oh, I get it finally. [<iterator object at 0x03581AB0>, <iterator object at 0x03581AB0>] opened my eyes. I did't know that I can create list of objects like that. ::D
Does [obj]*n works with any class or function.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#9
So, in this case, the zip() would be...
>>> a = '0123456789abcdefghijklmnopqrstuvwxyz'
>>> [''.join(chunk) for chunk in zip(*[iter(a)]*4)]

['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz']
Which works in this specific case, because the string is evenly divisible by 4... if we add a couple more items...
>>> a = '0123456789abcdefghijklmnopqrstuvwxyz01'
>>> [''.join(chunk) for chunk in zip(*[iter(a)]*4)]
['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz']
...the last two are ignored.  So if you want evenly sized chunks, zip would work.  If you just want all the string, but in sizes no greater than 4, zip won't do the job.  The previously mentioned range()/slice solution works fine, though (and is 100% easier to read).
>>> chunk = 4
>>> [a[i:i+chunk] for i in range(0, len(a), chunk)]
['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz', '23']

Or, if you like regular expressions, this is also a pretty readable way to do it: 
>>> import re
>>> a
'0123456789abcdefghijklmnopqrstuvwxyz23'
>>> re.findall('.{1,4}', a)
['0123', '4567', '89ab', 'cdef', 'ghij', 'klmn', 'opqr', 'stuv', 'wxyz', '23']
Reply
#10
This is a good post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  splitting file into multiple files by searching for string AlphaInc 2 902 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  Splitting a string twice bazcurtis 2 5,554 Mar-09-2020, 02:54 PM
Last Post: perfringo
  splitting a string with 2 different delimiters Skaperen 4 2,730 Dec-30-2019, 04:49 AM
Last Post: BamBi25
  Splitting String into 2d list cclark135 2 2,804 Aug-26-2019, 01:46 PM
Last Post: ThomasL
  Strange behaviour while splitting string? naknak12 2 2,582 Feb-18-2019, 01:57 PM
Last Post: naknak12
  splitting a string by 2 characters Skaperen 8 8,928 Dec-27-2016, 06:14 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020