Python Forum
splitting a string by different splits
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
splitting a string by different splits
#1
this is different than a previous question i asked back on 2016-12-26.

i have a string to split into 3 or more pieces where the separators are different. an example of what i mean is 'foo/bar:xyzzy' -> ['foo','bar','xyzzy'] given the string to split (given at run time) and the separators (known at coding time) in some form.

this is not hard to do.  two splits and it's done.  it can even be done in one line.  is there some nice way to code this that looks decent or cleaner?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Isnt this exactly the same question? If your looking for a nice look, you can throw whatever fix you want into a function/module and call that when you want to split by numerous delimiters. 

str.maketrans py3
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = s.maketrans('/:;', '...')
>>> s.translate(trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
py2
>>> import string
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = string.maketrans('/:;', '...')
>>> string.translate(s, trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
regex
>>> s
'foo/bar:xyzzy;daz'
>>> re.split('[:;/]', s)
['foo', 'bar', 'xyzzy', 'daz']
If your parsing HTML, you shouldnt be using regex, or maketrans, but only an html parser like BeautifulSoup or lxml. I rarely need to split by numerous delimiters. Most of the time though i just use split twice, its more explicit in what my purpose is. If the splitting gets a little more i might de maketrans or regex.
Recommended Tutorials:
Reply
#3
In a lot of languages, the answer is going to be "use a regex".  You can split just by the two or three different things you want to, or you can go crazy and split by anything that's fishy:
>>> re.split(r"\W+", "what's the:big&idea?dude")
['what', 's', 'the', 'big', 'idea', 'dude']
Reply
#4
(Feb-08-2017, 07:34 AM)nilamo Wrote: In a lot of languages, the answer is going to be "use a regex".  You can split just by the two or three different things you want to, or you can go crazy and split by anything that's fishy:
>>> re.split(r"\W+", "what's the:big&idea?dude")
['what', 's', 'the', 'big', 'idea', 'dude']

yeah, i think a regex is going to be the direct and concise way.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
I finally have to learn the re module.  Angry Regular expressions always bugs me.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
(Feb-08-2017, 05:31 AM)metulburr Wrote: str.maketrans py3
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = s.maketrans('/:;', '...')
>>> s.translate(trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
py2
>>> import string
>>> s = 'foo/bar:xyzzy;daz'
>>> trans = string.maketrans('/:;', '...')
>>> string.translate(s, trans).split('.')
['foo', 'bar', 'xyzzy', 'daz']
regex
>>> s
'foo/bar:xyzzy;daz'
>>> re.split('[:;/]', s)
['foo', 'bar', 'xyzzy', 'daz']
Matter of style of course, but in your two first solutions:
  1. You are arbitrarily using '.' as a new delimiter, which may be a problem (if the final purpose is to parse URLs, I hope you aren't parsing one with IP adresses)
  2. delimiters are specified in several places
So my suggestions would be:
delims='/:;'
trans = string.maketrans(delims, delims[0]*len(delims))
s.translate(trans).split(delims[0])
Of course trans also translates the first delimiter in to itself, so if yo uinsist you can use the more
peadantic but less readable:
trans = string.maketrans(delims[1:], delims[0]*(len(delims)-1))
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#7
Instead of dots why not spaces? Or '\x00'?
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#8
Quote:Instead of dots why not spaces? Or '\x00'?
the dot was just choosing what was not in the string already. But yes i think ofnuts method is better by grabbing an existing delimiter to split by instead of just making one that might be there unseen. Like i said i very rarely ever do multiple splits. Most cases that seems like the need is there, BeautifulSoup or lxml does the job better. And the other cases, i admit i just use split couple times as that is the most readable. 

Quote:
delims='/:;'
trans = string.maketrans(delims, delims[0]*len(delims))
s.translate(trans).split(delims[0])
That would definitely be better if your throwing in a function.
Recommended Tutorials:
Reply
#9
(Feb-08-2017, 09:20 AM)wavic Wrote: I finally have to learn the re module.  Angry Regular expressions always bugs me.

me, too.  i have run into issues with them, at times (before using python).

i am parsing a network config that includes descriptors of network subnets that are also translated to another address.

172.31.0.0/16=10.3.0.0

in this case i am not splitting addresses into octets, but i am splitting into base address, prefix length, and NAT address.

cidr, nat = s.split('=')
base, prelen = cidr.split('/')
i normally do not like to describe what i am doing in order to keep things focused on the aspect i am dealing with at this time and for this thread which is currently the exploration of alternatives for coding the splits.  there may also need to be error checks added in so that meaningful error messages can be produced for users, as opposed to a python error traceback.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#10
(Feb-08-2017, 09:20 AM)wavic Wrote: I finally have to learn the re module.  Angry Regular expressions always bugs me.

Get the Owl book. Accept no substitutes.
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to print just sizes content with splits? fergar470 1 1,564 Aug-13-2020, 10:07 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020