Python Forum
splitting on 2 or more possible characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
splitting on 2 or more possible characters
#1
this is easy enough to do, but IMHO, makes for some ugly code, compared to splitting on a single character (call split()). a string might have "foo-bar" or it might have "foo_bar". what i want to do is split it on either "-" or "_" so that i get ["foo","bar"] for both cases. there will never be a case of both "-" and "_" together in one string. but i might have more than 2 characters to split on. the splitter characters can be provided in whatever form works best (a list, or tuple, or set, or frozenset, or string). ideally, i'd like to do this in one line without making a function or class but i suspect making a function will be best.

then as the next split thing i'd like to split on a substring, such as "foo123bar" being by "123" to get ["foo","bar"]. i have put any thought into this, yet, but i'd guess there are many implementations of a wide spectrum of quality.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
If there are only two symbols you want to split on then you can do replace and then split.

>>> a = 'spam_ham-bacon_eggs'
>>> a.replace('_', '-').split('-')
['spam', 'ham', 'bacon', 'eggs']
It will not give you any errors when symbols are not present so no need to catch error with try...except.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
ok, but now what if there are 3 different characters to split on? 4? N? all that are in a container of a type you can choose?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
If 'first replace then split' suits your needs it quite easy to implement on containers:

>>> a = 'bacon-eggs_spam*ham!foo&bar'
>>> breakpoints = ['_', '*', '!', '&']
>>> for char in a:
...     if char in breakpoints:
...         a = a.replace(char, '-')
...
>>> a.split('-')
['bacon', 'eggs', 'spam', 'ham', 'foo', 'bar']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
import re

a = 'bacon-eggs_spam*ham!foo&bar'
# single char
print(re.split('[-_*!&]', a))

b = 'foo123bar'
# group
print(re.split('123', b))

import re

b = 'foo123bar'
c = re.split('\d', b) # split numbers
print(c) # not filter
print([i for i in c if i != '']) # filter

import re

a = 'bacon-eggs_spam*ham!foo&bar'
print(re.split('\W', a))
print(re.split('\W|[_]', a))
99 percent of computer problems exists between chair and keyboard.
Reply
#6
normally i don't like re but in this case i do like re.split.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
I personally find re.compile quite helpful when dealing with regular expressions. It makes split expression more concise:

>>> import re
>>> a = 'bacon-eggs_spam*ham!foo&bar'
>>> breakpoints = re.compile('[-_*!&]')
>>> breakpoints.split(a)
['bacon', 'eggs', 'spam', 'ham', 'foo', 'bar']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,788 May-15-2020, 01:37 PM
Last Post: snippsat
  splitting or parsing control characters Skaperen 19 11,133 May-26-2017, 07:27 PM
Last Post: zivoni
  splitting a string by 2 characters Skaperen 8 8,940 Dec-27-2016, 06:14 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020