Posts: 4,646
Threads: 1,493
Joined: Sep 2016
this is different than a previous question i asked back on 2016-12-26.
i have a string to split into 3 or more pieces where the separators are different. an example of what i mean is 'foo/bar:xyzzy' -> ['foo','bar','xyzzy'] given the string to split (given at run time) and the separators (known at coding time) in some form.
this is not hard to do. two splits and it's done. it can even be done in one line. is there some nice way to code this that looks decent or cleaner?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 5,151
Threads: 396
Joined: Sep 2016
Feb-08-2017, 05:31 AM
(This post was last modified: Feb-08-2017, 05:31 AM by metulburr.)
Isnt this exactly the same question? If your looking for a nice look, you can throw whatever fix you want into a function/module and call that when you want to split by numerous delimiters.
str.maketrans py3
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = s.maketrans('/:;', '...')
>>> s.translate(trans).split('.')
['foo', 'bar', 'xyzzy', 'daz'] py2
>>> import string
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = string.maketrans('/:;', '...')
>>> string.translate(s, trans).split('.')
['foo', 'bar', 'xyzzy', 'daz'] regex
>>> s
'foo/bar:xyzzy;daz'
>>> re.split('[:;/]', s)
['foo', 'bar', 'xyzzy', 'daz'] If your parsing HTML, you shouldnt be using regex, or maketrans, but only an html parser like BeautifulSoup or lxml. I rarely need to split by numerous delimiters. Most of the time though i just use split twice, its more explicit in what my purpose is. If the splitting gets a little more i might de maketrans or regex.
Recommended Tutorials:
Posts: 3,458
Threads: 101
Joined: Sep 2016
In a lot of languages, the answer is going to be "use a regex". You can split just by the two or three different things you want to, or you can go crazy and split by anything that's fishy: >>> re.split(r"\W+", "what's the:big&idea?dude")
['what', 's', 'the', 'big', 'idea', 'dude']
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
(Feb-08-2017, 07:34 AM)nilamo Wrote: In a lot of languages, the answer is going to be "use a regex". You can split just by the two or three different things you want to, or you can go crazy and split by anything that's fishy:>>> re.split(r"\W+", "what's the:big&idea?dude")
['what', 's', 'the', 'big', 'idea', 'dude']
yeah, i think a regex is going to be the direct and concise way.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 2,953
Threads: 48
Joined: Sep 2016
I finally have to learn the re module.  Regular expressions always bugs me.
Posts: 687
Threads: 37
Joined: Sep 2016
Feb-08-2017, 01:13 PM
(This post was last modified: Feb-08-2017, 04:04 PM by Ofnuts.)
(Feb-08-2017, 05:31 AM)metulburr Wrote: str.maketrans py3
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = s.maketrans('/:;', '...')
>>> s.translate(trans).split('.')
['foo', 'bar', 'xyzzy', 'daz'] py2
>>> import string
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = string.maketrans('/:;', '...')
>>> string.translate(s, trans).split('.')
['foo', 'bar', 'xyzzy', 'daz'] regex
>>> s
'foo/bar:xyzzy;daz'
>>> re.split('[:;/]', s)
['foo', 'bar', 'xyzzy', 'daz'] Matter of style of course, but in your two first solutions:
- You are arbitrarily using
'.' as a new delimiter, which may be a problem (if the final purpose is to parse URLs, I hope you aren't parsing one with IP adresses)
- delimiters are specified in several places
So my suggestions would be:
delims='/:;'
trans = string.maketrans(delims, delims[0]*len(delims))
s.translate(trans).split(delims[0]) Of course trans also translates the first delimiter in to itself, so if yo uinsist you can use the more
peadantic but less readable:
trans = string.maketrans(delims[1:], delims[0]*(len(delims)-1))
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Posts: 2,953
Threads: 48
Joined: Sep 2016
Instead of dots why not spaces? Or '\x00'?
Posts: 5,151
Threads: 396
Joined: Sep 2016
Feb-08-2017, 02:22 PM
(This post was last modified: Feb-08-2017, 02:22 PM by metulburr.)
Quote:Instead of dots why not spaces? Or '\x00'?
the dot was just choosing what was not in the string already. But yes i think ofnuts method is better by grabbing an existing delimiter to split by instead of just making one that might be there unseen. Like i said i very rarely ever do multiple splits. Most cases that seems like the need is there, BeautifulSoup or lxml does the job better. And the other cases, i admit i just use split couple times as that is the most readable.
Quote:delims='/:;'
trans = string.maketrans(delims, delims[0]*len(delims))
s.translate(trans).split(delims[0])
That would definitely be better if your throwing in a function.
Recommended Tutorials:
Posts: 4,646
Threads: 1,493
Joined: Sep 2016
Feb-09-2017, 02:48 AM
(This post was last modified: Feb-09-2017, 03:09 AM by Skaperen.)
(Feb-08-2017, 09:20 AM)wavic Wrote: I finally have to learn the re module. Regular expressions always bugs me.
me, too. i have run into issues with them, at times (before using python).
i am parsing a network config that includes descriptors of network subnets that are also translated to another address.
172.31.0.0/16=10.3.0.0
in this case i am not splitting addresses into octets, but i am splitting into base address, prefix length, and NAT address.
cidr, nat = s.split('=')
base, prelen = cidr.split('/') i normally do not like to describe what i am doing in order to keep things focused on the aspect i am dealing with at this time and for this thread which is currently the exploration of alternatives for coding the splits. there may also need to be error checks added in so that meaningful error messages can be produced for users, as opposed to a python error traceback.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 687
Threads: 37
Joined: Sep 2016
(Feb-08-2017, 09:20 AM)wavic Wrote: I finally have to learn the re module. Regular expressions always bugs me.
Get the Owl book. Accept no substitutes.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
|