Python Forum

Full Version: splitting a string by different splits
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
this is different than a previous question i asked back on 2016-12-26.

i have a string to split into 3 or more pieces where the separators are different. an example of what i mean is 'foo/bar:xyzzy' -> ['foo','bar','xyzzy'] given the string to split (given at run time) and the separators (known at coding time) in some form.

this is not hard to do.  two splits and it's done.  it can even be done in one line.  is there some nice way to code this that looks decent or cleaner?
Isnt this exactly the same question? If your looking for a nice look, you can throw whatever fix you want into a function/module and call that when you want to split by numerous delimiters. 

str.maketrans py3
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = s.maketrans('/:;', '...')
>>> s.translate(trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
py2
>>> import string
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = string.maketrans('/:;', '...')
>>> string.translate(s, trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
regex
>>> s
'foo/bar:xyzzy;daz'
>>> re.split('[:;/]', s)
['foo', 'bar', 'xyzzy', 'daz']
If your parsing HTML, you shouldnt be using regex, or maketrans, but only an html parser like BeautifulSoup or lxml. I rarely need to split by numerous delimiters. Most of the time though i just use split twice, its more explicit in what my purpose is. If the splitting gets a little more i might de maketrans or regex.
In a lot of languages, the answer is going to be "use a regex".  You can split just by the two or three different things you want to, or you can go crazy and split by anything that's fishy:
>>> re.split(r"\W+", "what's the:big&idea?dude")
['what', 's', 'the', 'big', 'idea', 'dude']
(Feb-08-2017, 07:34 AM)nilamo Wrote: [ -> ]In a lot of languages, the answer is going to be "use a regex".  You can split just by the two or three different things you want to, or you can go crazy and split by anything that's fishy:
>>> re.split(r"\W+", "what's the:big&idea?dude")
['what', 's', 'the', 'big', 'idea', 'dude']

yeah, i think a regex is going to be the direct and concise way.
I finally have to learn the re module.  Angry Regular expressions always bugs me.
(Feb-08-2017, 05:31 AM)metulburr Wrote: [ -> ]str.maketrans py3
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = s.maketrans('/:;', '...')
>>> s.translate(trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
py2
>>> import string
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = string.maketrans('/:;', '...')
>>> string.translate(s, trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
regex
>>> s
'foo/bar:xyzzy;daz'
>>> re.split('[:;/]', s)
['foo', 'bar', 'xyzzy', 'daz']
Matter of style of course, but in your two first solutions:
  1. You are arbitrarily using '.' as a new delimiter, which may be a problem (if the final purpose is to parse URLs, I hope you aren't parsing one with IP adresses)
  2. delimiters are specified in several places
So my suggestions would be:
delims='/:;'
trans = string.maketrans(delims, delims[0]*len(delims))
s.translate(trans).split(delims[0])
Of course trans also translates the first delimiter in to itself, so if yo uinsist you can use the more
peadantic but less readable:
trans = string.maketrans(delims[1:], delims[0]*(len(delims)-1))
Instead of dots why not spaces? Or '\x00'?
Quote:Instead of dots why not spaces? Or '\x00'?
the dot was just choosing what was not in the string already. But yes i think ofnuts method is better by grabbing an existing delimiter to split by instead of just making one that might be there unseen. Like i said i very rarely ever do multiple splits. Most cases that seems like the need is there, BeautifulSoup or lxml does the job better. And the other cases, i admit i just use split couple times as that is the most readable. 

Quote:
delims='/:;'
trans = string.maketrans(delims, delims[0]*len(delims))
s.translate(trans).split(delims[0])
That would definitely be better if your throwing in a function.
(Feb-08-2017, 09:20 AM)wavic Wrote: [ -> ]I finally have to learn the re module.  Angry Regular expressions always bugs me.

me, too.  i have run into issues with them, at times (before using python).

i am parsing a network config that includes descriptors of network subnets that are also translated to another address.

172.31.0.0/16=10.3.0.0

in this case i am not splitting addresses into octets, but i am splitting into base address, prefix length, and NAT address.

cidr, nat = s.split('=')
base, prelen = cidr.split('/')
i normally do not like to describe what i am doing in order to keep things focused on the aspect i am dealing with at this time and for this thread which is currently the exploration of alternatives for coding the splits.  there may also need to be error checks added in so that meaningful error messages can be produced for users, as opposed to a python error traceback.
(Feb-08-2017, 09:20 AM)wavic Wrote: [ -> ]I finally have to learn the re module.  Angry Regular expressions always bugs me.

Get the Owl book. Accept no substitutes.