Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
split by character class
#1
i want to split a string at the boundary of different classes of character values. in my current case a string has a few ranges of digits so i want something like: docs/3.4.10/python-3.4.10-pdf-letter.tar.bz2 to split up like ['docs/','3','.','4','.','10','/python-','3','.','4','.','10','-pdf-letter.tar.bz','2'].

way back in C i would have to implement this the hard way by scanning the string character by character and look up its class and split and step to the next result element when the character class changes, as mapped by the caller who might want everything not a digit to be grouped together. i don't want to do it that way in Python.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
I think you already asked this question in the past. Here is the solution for sequences of digits
>>> import re
>>> re.split(r'(\d+)', "docs/3.4.10/python-3.4.10-pdf-letter.tar.bz2")
['docs/', '3', '.', '4', '.', '10', '/python-', '3', '.', '4', '.', '10', '-pdf-letter.tar.bz', '2', '']
Reply
#3
thanks! yes, i could have asked this in the past. it has been on my mind in a number of forms for a while. in this case then intent is next to scan the list and find the length of the longest digits string, ignoring those that are not digits. the scan again and pad each digits string with leading zeros to make them all equal length. then the list can either be returned as the key as is or joined back into one string and returned to implement the numeric sort.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
it works as re.split('(\d+)',...), e.g. not raw. apparently there is no \d backslash sequence. but i think this is not a good way to code.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Class test : good way to split methods into several files paul18fr 4 402 Jan-30-2024, 11:46 AM
Last Post: Pedroski55
  [split] Class takes no arguments bily071 2 597 Oct-23-2023, 03:59 PM
Last Post: deanhystad
  How to split the input taken from user into a single character? mHosseinDS86 3 1,137 Aug-17-2022, 12:43 PM
Last Post: Pedroski55
  [solved] unexpected character after line continuation character paul18fr 4 3,292 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,106 Jul-13-2020, 07:05 PM
Last Post: snippsat
  [split] Python Class Problem astral_travel 12 4,818 Apr-29-2020, 07:13 PM
Last Post: michael1789
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,673 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  Replace changing string including uppercase character with lowercase character silfer 11 6,069 Mar-25-2019, 12:54 PM
Last Post: silfer
  [split] python class JPan 6 3,495 Aug-19-2018, 04:58 PM
Last Post: JPan
  SyntaxError: unexpected character after line continuation character Saka 2 18,472 Sep-26-2017, 09:34 AM
Last Post: Saka

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020