Python Forum

Full Version: split by character class
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i want to split a string at the boundary of different classes of character values. in my current case a string has a few ranges of digits so i want something like: docs/3.4.10/python-3.4.10-pdf-letter.tar.bz2 to split up like ['docs/','3','.','4','.','10','/python-','3','.','4','.','10','-pdf-letter.tar.bz','2'].

way back in C i would have to implement this the hard way by scanning the string character by character and look up its class and split and step to the next result element when the character class changes, as mapped by the caller who might want everything not a digit to be grouped together. i don't want to do it that way in Python.
I think you already asked this question in the past. Here is the solution for sequences of digits
>>> import re
>>> re.split(r'(\d+)', "docs/3.4.10/python-3.4.10-pdf-letter.tar.bz2")
['docs/', '3', '.', '4', '.', '10', '/python-', '3', '.', '4', '.', '10', '-pdf-letter.tar.bz', '2', '']
thanks! yes, i could have asked this in the past. it has been on my mind in a number of forms for a while. in this case then intent is next to scan the list and find the length of the longest digits string, ignoring those that are not digits. the scan again and pad each digits string with leading zeros to make them all equal length. then the list can either be returned as the key as is or joined back into one string and returned to implement the numeric sort.
it works as re.split('(\d+)',...), e.g. not raw. apparently there is no \d backslash sequence. but i think this is not a good way to code.