Posts: 4,646
Threads: 1,493
Joined: Sep 2016
i need to do a split of a string with white space as the delimiter but i need to keep string of what each white space delimiter actually is. what i need is a list such that ''.join(the_string.split()) would result in the same exact original string. is there a simple way to do this?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
thank you! i still cannot comprehend how regular expressions work other than a couple basic things. but i assume you used a raw string so that \s would be passed to re.split() literally.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
the logic this code needs is getting more and more complex to the point it is complicated. it is going to get a substring to look for in a larger string, starting at a given position and stopping at another given position. the substring to look for may have a special single character meant to match up with a run of one or more white-space characters in the string it is looking in. since it will be work with what follows, it will need to know where the match comes to an end, which will vary depending on runs of white-space that are matched. it needs to only match the full run of white-space when that is to happen, not just part of it. however, the starting and ending positions of the given larger string can cut off runs at each end.
as i work on this and try out all the jagged-corner cases, it seems i need to keep changing this all the time.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
it is a sequence of characters given in a command argument meant to act similar to the cut command, that parse each line for output. right now a _ is meant to match a run of white space while \_ or in quotes just matches an underscore (in each line of input). i should probably make a version of this that somehow uses regular expressions, though i would have to ask others to test it.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.