Python Forum

Full Version: splitting on 2 or more possible characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
this is easy enough to do, but IMHO, makes for some ugly code, compared to splitting on a single character (call split()). a string might have "foo-bar" or it might have "foo_bar". what i want to do is split it on either "-" or "_" so that i get ["foo","bar"] for both cases. there will never be a case of both "-" and "_" together in one string. but i might have more than 2 characters to split on. the splitter characters can be provided in whatever form works best (a list, or tuple, or set, or frozenset, or string). ideally, i'd like to do this in one line without making a function or class but i suspect making a function will be best.

then as the next split thing i'd like to split on a substring, such as "foo123bar" being by "123" to get ["foo","bar"]. i have put any thought into this, yet, but i'd guess there are many implementations of a wide spectrum of quality.
If there are only two symbols you want to split on then you can do replace and then split.

>>> a = 'spam_ham-bacon_eggs'
>>> a.replace('_', '-').split('-')
['spam', 'ham', 'bacon', 'eggs']
It will not give you any errors when symbols are not present so no need to catch error with try...except.
ok, but now what if there are 3 different characters to split on? 4? N? all that are in a container of a type you can choose?
If 'first replace then split' suits your needs it quite easy to implement on containers:

>>> a = 'bacon-eggs_spam*ham!foo&bar'
>>> breakpoints = ['_', '*', '!', '&']
>>> for char in a:
...     if char in breakpoints:
...         a = a.replace(char, '-')
...
>>> a.split('-')
['bacon', 'eggs', 'spam', 'ham', 'foo', 'bar']
import re

a = 'bacon-eggs_spam*ham!foo&bar'
# single char
print(re.split('[-_*!&]', a))

b = 'foo123bar'
# group
print(re.split('123', b))

import re

b = 'foo123bar'
c = re.split('\d', b) # split numbers
print(c) # not filter
print([i for i in c if i != '']) # filter

import re

a = 'bacon-eggs_spam*ham!foo&bar'
print(re.split('\W', a))
print(re.split('\W|[_]', a))
normally i don't like re but in this case i do like re.split.
I personally find re.compile quite helpful when dealing with regular expressions. It makes split expression more concise:

>>> import re
>>> a = 'bacon-eggs_spam*ham!foo&bar'
>>> breakpoints = re.compile('[-_*!&]')
>>> breakpoints.split(a)
['bacon', 'eggs', 'spam', 'ham', 'foo', 'bar']