Python Forum
including the white space parts in str.split()
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
including the white space parts in str.split()
#1
i need to do a split of a string with white space as the delimiter but i need to keep string of what each white space delimiter actually is. what i need is a list such that ''.join(the_string.split()) would result in the same exact original string. is there a simple way to do this?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
re.split() does that
>>> import re
>>> re.split(r"(\s+)", "The quick brown fox  jumps over \tthe lazy dog")
['The', ' ', 'quick', ' ', 'brown', ' ', 'fox', '  ', 'jumps', ' ', 'over', ' \t', 'the', ' ', 'lazy', ' ', 'dog']
Reply
#3
thank you! i still cannot comprehend how regular expressions work other than a couple basic things. but i assume you used a raw string so that \s would be passed to re.split() literally.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
Skaperen Wrote:i assume you used a raw string so that \s would be passed to re.split() literally.
Exactly, regular expressions need to be written in the regular expression language which contains special characters such as \ or (. In the case of \s, it doesn't make any difference because the python compiler doesn't interprete \s in literal strings. On the other hand it interpretes other escaped sequences such as \n in literal string. See for example
>>> print("\n")


>>> print(r"\n")
\n
>>> print("\s")
\s
>>> print(r"\s")
\s
My advice is to use raw strings by default when one writes regular expressions. One writes them for the re parser, not the python parser.
Reply
#5
the logic this code needs is getting more and more complex to the point it is complicated. it is going to get a substring to look for in a larger string, starting at a given position and stopping at another given position. the substring to look for may have a special single character meant to match up with a run of one or more white-space characters in the string it is looking in. since it will be work with what follows, it will need to know where the match comes to an end, which will vary depending on runs of white-space that are matched. it needs to only match the full run of white-space when that is to happen, not just part of it. however, the starting and ending positions of the given larger string can cut off runs at each end.

as i work on this and try out all the jagged-corner cases, it seems i need to keep changing this all the time.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
Skaperen Wrote:the substring to look for may have a special single character meant to match up with a run of one or more white-space characters

This is what \s+ does in a regular expression. You could perhaps explain the real problem you're working on and the actual data that you need to match.
Reply
#7
it is a sequence of characters given in a command argument meant to act similar to the cut command, that parse each line for output. right now a _ is meant to match a run of white space while \_ or in quotes just matches an underscore (in each line of input). i should probably make a version of this that somehow uses regular expressions, though i would have to ask others to test it.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  identify not white pixels in bmp flash77 17 2,209 Nov-10-2023, 09:21 PM
Last Post: flash77
  Function to count words in a list up to and including Sam Oldman45 15 6,340 Sep-08-2023, 01:10 PM
Last Post: Pedroski55
  Including data files in a package ChrisOfBristol 4 2,439 Oct-27-2021, 04:14 PM
Last Post: ChrisOfBristol
  Not including a constructor __init__ in the class definition... bytecrunch 3 11,345 Sep-02-2021, 04:40 AM
Last Post: deanhystad
  how to create pythonic codes including for loop and if statement? aupres 1 1,864 Jan-02-2021, 06:10 AM
Last Post: Gribouillis
  put an image into 3 parts Nickd12 4 2,663 Sep-30-2020, 11:00 PM
Last Post: Nickd12
  from global space to local space Skaperen 4 2,250 Sep-08-2020, 04:59 PM
Last Post: Skaperen
  Including modules in Python using sys.path.append JoeDainton123 1 2,827 Aug-24-2020, 04:51 AM
Last Post: millpond
  Including a Variable In the HTML Tags When Sending An Email JoeDainton123 0 1,829 Aug-08-2020, 03:11 AM
Last Post: JoeDainton123
  Remove from end of string up to and including some character lbtdne 2 2,263 May-17-2020, 09:24 AM
Last Post: menator01

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020