Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 including the white space parts in str.split()
#1
i need to do a split of a string with white space as the delimiter but i need to keep string of what each white space delimiter actually is. what i need is a list such that ''.join(the_string.split()) would result in the same exact original string. is there a simple way to do this?
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#2
re.split() does that
>>> import re
>>> re.split(r"(\s+)", "The quick brown fox  jumps over \tthe lazy dog")
['The', ' ', 'quick', ' ', 'brown', ' ', 'fox', '  ', 'jumps', ' ', 'over', ' \t', 'the', ' ', 'lazy', ' ', 'dog']
Skaperen likes this post
Quote
#3
thank you! i still cannot comprehend how regular expressions work other than a couple basic things. but i assume you used a raw string so that \s would be passed to re.split() literally.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#4
Skaperen Wrote:i assume you used a raw string so that \s would be passed to re.split() literally.
Exactly, regular expressions need to be written in the regular expression language which contains special characters such as \ or (. In the case of \s, it doesn't make any difference because the python compiler doesn't interprete \s in literal strings. On the other hand it interpretes other escaped sequences such as \n in literal string. See for example
>>> print("\n")


>>> print(r"\n")
\n
>>> print("\s")
\s
>>> print(r"\s")
\s
My advice is to use raw strings by default when one writes regular expressions. One writes them for the re parser, not the python parser.
Quote
#5
the logic this code needs is getting more and more complex to the point it is complicated. it is going to get a substring to look for in a larger string, starting at a given position and stopping at another given position. the substring to look for may have a special single character meant to match up with a run of one or more white-space characters in the string it is looking in. since it will be work with what follows, it will need to know where the match comes to an end, which will vary depending on runs of white-space that are matched. it needs to only match the full run of white-space when that is to happen, not just part of it. however, the starting and ending positions of the given larger string can cut off runs at each end.

as i work on this and try out all the jagged-corner cases, it seems i need to keep changing this all the time.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#6
Skaperen Wrote:the substring to look for may have a special single character meant to match up with a run of one or more white-space characters

This is what \s+ does in a regular expression. You could perhaps explain the real problem you're working on and the actual data that you need to match.
Quote
#7
it is a sequence of characters given in a command argument meant to act similar to the cut command, that parse each line for output. right now a _ is meant to match a run of white space while \_ or in quotes just matches an underscore (in each line of input). i should probably make a version of this that somehow uses regular expressions, though i would have to ask others to test it.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  "Up to but not including" (My personal guide on slicing & indexing) Drone4four 5 164 Nov-20-2019, 09:38 PM
Last Post: newbieAuggie2019
  Applying row height to all rows including and after row 7 curranjohn46 2 154 Oct-14-2019, 03:10 PM
Last Post: curranjohn46
  White spaces kdiba 1 153 Oct-08-2019, 06:52 PM
Last Post: Aurthor_King_of_the_Brittons
  Pyautogui script runs fine if split into two parts together it does not Bmart6969 1 129 Oct-07-2019, 10:53 PM
Last Post: Bmart6969
  replace white space with a string, is this pythonic? Skaperen 1 262 Jun-18-2019, 11:36 PM
Last Post: metulburr
  Including classes from folder issue graham23s 1 347 Apr-03-2019, 07:33 AM
Last Post: Gribouillis
  Because the emoji appears black and white at the exit ? nerd 3 647 Jan-28-2019, 11:34 PM
Last Post: nerd
  including big file contents in python source Skaperen 2 513 Nov-07-2018, 09:39 PM
Last Post: Skaperen
  White Space Help JayMan 9 1,742 Dec-26-2017, 02:58 AM
Last Post: Skaperen
  Python interface only black and white........ Wilson 3 1,562 Jul-15-2017, 01:20 PM
Last Post: sparkz_alot

Forum Jump:


Users browsing this thread: 1 Guest(s)