Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 split by character class
#1
i want to split a string at the boundary of different classes of character values. in my current case a string has a few ranges of digits so i want something like: docs/3.4.10/python-3.4.10-pdf-letter.tar.bz2 to split up like ['docs/','3','.','4','.','10','/python-','3','.','4','.','10','-pdf-letter.tar.bz','2'].

way back in C i would have to implement this the hard way by scanning the string character by character and look up its class and split and step to the next result element when the character class changes, as mapped by the caller who might want everything not a digit to be grouped together. i don't want to do it that way in Python.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#2
I think you already asked this question in the past. Here is the solution for sequences of digits
>>> import re
>>> re.split(r'(\d+)', "docs/3.4.10/python-3.4.10-pdf-letter.tar.bz2")
['docs/', '3', '.', '4', '.', '10', '/python-', '3', '.', '4', '.', '10', '-pdf-letter.tar.bz', '2', '']
Quote
#3
thanks! yes, i could have asked this in the past. it has been on my mind in a number of forms for a while. in this case then intent is next to scan the list and find the length of the longest digits string, ignoring those that are not digits. the scan again and pad each digits string with leading zeros to make them all equal length. then the list can either be returned as the key as is or joined back into one string and returned to implement the numeric sort.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#4
it works as re.split('(\d+)',...), e.g. not raw. apparently there is no \d backslash sequence. but i think this is not a good way to code.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  SyntaxError: invalid character in identifier neogeo 2 127 Jul-27-2019, 11:11 AM
Last Post: neogeo
  SyntaxError: invalid character in identifier ricardodepaula 2 120 Jul-25-2019, 09:20 PM
Last Post: ricardodepaula
  Moving to the next character 357mag 2 146 Jul-05-2019, 10:26 AM
Last Post: snippsat
  Remove \n at the end of a character from a list judkil 2 172 Jun-24-2019, 12:15 AM
Last Post: DeaD_EyE
  Error when entering letter/character instead of number/integer helplessnoobb 2 332 Jun-22-2019, 07:15 AM
Last Post: ThomasL
  the next higher character Skaperen 13 599 Jun-07-2019, 01:44 PM
Last Post: heiner55
  Find string and add character - newbi PyDK 1 154 May-15-2019, 01:22 PM
Last Post: ichabod801
  Call a varaible from class in the parent class Clement_2000 1 325 May-09-2019, 11:14 PM
Last Post: michalmonday
  Get variable from class inside another class hcne 3 272 Mar-30-2019, 03:02 PM
Last Post: ichabod801
  remove string character from url jacklee26 10 683 Mar-25-2019, 03:56 PM
Last Post: Larz60+

Forum Jump:


Users browsing this thread: 1 Guest(s)