Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 including the white space parts in str.split()
#1
i need to do a split of a string with white space as the delimiter but i need to keep string of what each white space delimiter actually is. what i need is a list such that ''.join(the_string.split()) would result in the same exact original string. is there a simple way to do this?
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#2
re.split() does that
>>> import re
>>> re.split(r"(\s+)", "The quick brown fox  jumps over \tthe lazy dog")
['The', ' ', 'quick', ' ', 'brown', ' ', 'fox', '  ', 'jumps', ' ', 'over', ' \t', 'the', ' ', 'lazy', ' ', 'dog']
Skaperen likes this post
Quote
#3
thank you! i still cannot comprehend how regular expressions work other than a couple basic things. but i assume you used a raw string so that \s would be passed to re.split() literally.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#4
Skaperen Wrote:i assume you used a raw string so that \s would be passed to re.split() literally.
Exactly, regular expressions need to be written in the regular expression language which contains special characters such as \ or (. In the case of \s, it doesn't make any difference because the python compiler doesn't interprete \s in literal strings. On the other hand it interpretes other escaped sequences such as \n in literal string. See for example
>>> print("\n")


>>> print(r"\n")
\n
>>> print("\s")
\s
>>> print(r"\s")
\s
My advice is to use raw strings by default when one writes regular expressions. One writes them for the re parser, not the python parser.
Quote
#5
the logic this code needs is getting more and more complex to the point it is complicated. it is going to get a substring to look for in a larger string, starting at a given position and stopping at another given position. the substring to look for may have a special single character meant to match up with a run of one or more white-space characters in the string it is looking in. since it will be work with what follows, it will need to know where the match comes to an end, which will vary depending on runs of white-space that are matched. it needs to only match the full run of white-space when that is to happen, not just part of it. however, the starting and ending positions of the given larger string can cut off runs at each end.

as i work on this and try out all the jagged-corner cases, it seems i need to keep changing this all the time.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote
#6
Skaperen Wrote:the substring to look for may have a special single character meant to match up with a run of one or more white-space characters

This is what \s+ does in a regular expression. You could perhaps explain the real problem you're working on and the actual data that you need to match.
Quote
#7
it is a sequence of characters given in a command argument meant to act similar to the cut command, that parse each line for output. right now a _ is meant to match a run of white space while \_ or in quotes just matches an underscore (in each line of input). i should probably make a version of this that somehow uses regular expressions, though i would have to ask others to test it.
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  modifying variables in local or global space Skaperen 2 66 Aug-14-2019, 07:13 AM
Last Post: Skaperen
  extract from strings some parts dervast 1 145 Jul-04-2019, 09:44 PM
Last Post: Axel_Erfurt
  replace white space with a string, is this pythonic? Skaperen 1 157 Jun-18-2019, 11:36 PM
Last Post: metulburr
  after using openpyxl to add colors to script, black shows up white online in excel Soundtechscott 1 199 Jun-08-2019, 10:33 PM
Last Post: Soundtechscott
  How to remove space between strings sunnyarora 2 220 May-03-2019, 11:44 AM
Last Post: perfringo
  strip space from end of a row of text ineuw 4 243 Apr-15-2019, 03:14 AM
Last Post: ineuw
  Including classes from folder issue graham23s 1 246 Apr-03-2019, 07:33 AM
Last Post: Gribouillis
  Replace changing string including uppercase character with lowercase character silfer 11 567 Mar-25-2019, 12:54 PM
Last Post: silfer
  comparing fractional parts of floats Skaperen 4 302 Mar-19-2019, 03:19 AM
Last Post: casevh
  blank space + input() ClassicalSoul 6 380 Mar-06-2019, 09:50 AM
Last Post: perfringo

Forum Jump:


Users browsing this thread: 1 Guest(s)