Python Forum
splitting on 2 or more possible characters - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: splitting on 2 or more possible characters (/thread-12589.html)



splitting on 2 or more possible characters - Skaperen - Sep-02-2018

this is easy enough to do, but IMHO, makes for some ugly code, compared to splitting on a single character (call split()). a string might have "foo-bar" or it might have "foo_bar". what i want to do is split it on either "-" or "_" so that i get ["foo","bar"] for both cases. there will never be a case of both "-" and "_" together in one string. but i might have more than 2 characters to split on. the splitter characters can be provided in whatever form works best (a list, or tuple, or set, or frozenset, or string). ideally, i'd like to do this in one line without making a function or class but i suspect making a function will be best.

then as the next split thing i'd like to split on a substring, such as "foo123bar" being by "123" to get ["foo","bar"]. i have put any thought into this, yet, but i'd guess there are many implementations of a wide spectrum of quality.


RE: splitting on 2 or more possible characters - perfringo - Sep-02-2018

If there are only two symbols you want to split on then you can do replace and then split.

>>> a = 'spam_ham-bacon_eggs'
>>> a.replace('_', '-').split('-')
['spam', 'ham', 'bacon', 'eggs']
It will not give you any errors when symbols are not present so no need to catch error with try...except.


RE: splitting on 2 or more possible characters - Skaperen - Sep-02-2018

ok, but now what if there are 3 different characters to split on? 4? N? all that are in a container of a type you can choose?


RE: splitting on 2 or more possible characters - perfringo - Sep-02-2018

If 'first replace then split' suits your needs it quite easy to implement on containers:

>>> a = 'bacon-eggs_spam*ham!foo&bar'
>>> breakpoints = ['_', '*', '!', '&']
>>> for char in a:
...     if char in breakpoints:
...         a = a.replace(char, '-')
...
>>> a.split('-')
['bacon', 'eggs', 'spam', 'ham', 'foo', 'bar']



RE: splitting on 2 or more possible characters - Windspar - Sep-02-2018

import re

a = 'bacon-eggs_spam*ham!foo&bar'
# single char
print(re.split('[-_*!&]', a))

b = 'foo123bar'
# group
print(re.split('123', b))

import re

b = 'foo123bar'
c = re.split('\d', b) # split numbers
print(c) # not filter
print([i for i in c if i != '']) # filter

import re

a = 'bacon-eggs_spam*ham!foo&bar'
print(re.split('\W', a))
print(re.split('\W|[_]', a))



RE: splitting on 2 or more possible characters - Skaperen - Sep-02-2018

normally i don't like re but in this case i do like re.split.


RE: splitting on 2 or more possible characters - perfringo - Sep-03-2018

I personally find re.compile quite helpful when dealing with regular expressions. It makes split expression more concise:

>>> import re
>>> a = 'bacon-eggs_spam*ham!foo&bar'
>>> breakpoints = re.compile('[-_*!&]')
>>> breakpoints.split(a)
['bacon', 'eggs', 'spam', 'ham', 'foo', 'bar']