Python Forum

Full Version: splitting a string by 2 characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i want to split a string at each place where 2 different characters might appear.

'some text<some tag>more text'.split2('<','>')

--> ['some text','some tag','more text']

yes, this example is splitting HTML, so that may be a bad example.  consider this outside of the scope of HTML so your thinking does not converge on HTMLParser.  Maybe it would be nice if str.split could be upgraded to do this like:

'some text<some tag>more text'.split(['<','>'])
Any reason why regex wouldn't work?
re.split('[<>]', your_string)
Obviously if HTML is an important part of this, you need to consider things carefully.
Here! I am using two commas to replacing each of < > because it's quit unlikely to see it in any text. Then split it by ',,'

>>> s = 'some text<some tag>more text'.translate({60: ',,', 62: ',,'}).split(',,')
>>> s
['some text', 'some tag', 'more text']
>>> 
And it's faster as I know
(Dec-26-2016, 06:12 AM)micseydel Wrote: [ -> ]Any reason why regex wouldn't work?
i've had cases where regex has failed so i just generally avoid it anymore.  i did not evaluate it for this.

(Dec-26-2016, 06:12 AM)micseydel Wrote: [ -> ]Obviously if HTML is an important part of this, you need to consider things carefully.
i had a case which did involve HTML and thinking about it led me to HTMLParser.  this also left me with the thought about the 2 character splitting.

(Dec-26-2016, 07:01 AM)wavic Wrote: [ -> ]Here! I am using two commas to replacing each of < > because it's quit unlikely to see it in any text. Then split it by ',,'

>>> s = 'some text<some tag>more text'.translate({60: ',,', 62: ',,'}).split(',,')
>>> s
['some text', 'some tag', 'more text']
>>> 
And it's faster as I know

so why not:
'some text<some tag>more text'.translate({62: '<'}).split('<')
?
It could be  Big Grin It's a kind of optimization
well... .translate() does not work for this under python 2 (2.7.12).  it's there, but:

Output:
lt1/forums /home/forums 9> py2 Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'some text<some tag>more text'.translate({62: '<'}).split('<') Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: expected a string or other character buffer object >>> lt1/forums /home/forums 10>
ok, so how about:

Output:
lt1/forums /home/forums 10> py2 Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> '>'.join('some text<some tag>more text'.split('<')).split('>') ['some text', 'some tag', 'more text'] >>> lt1/forums /home/forums 11>
Translate() works in different way in python 2.7. You have to create a translation table.
>>> from string import maketrans
>>> s = 'some text<some tag>more text'.translate(maketrans('>', '<')).split('<')
>>> s
['some text', 'some tag', 'more text']
Maketrans() in python 2.7 works with the ascii set of characters. It makes translation table for all of these even if you want to translate only one character to another. Then it's passing it to the translate() method. In Python 3 because of its native utf-8 support translate gets the translation table as a simple dictionary with the numerical value of each utf-8 character as a key.

Joining and splitting works too :) It didn't cross my mind
a translate table for utf-8 would have its difficulties.  even for unicode it would at least be very large.  i can see the pluses of using a dictionary.  BTDT in C (used an AVL mapping i wrote).
maketrans() is more convenient for me because you pass as parameters only two sets of characters and the third one for those to delete. In Python 3 way you have to create the dict by your own. I don't use  translate often otherwise I'd write something like maketrans(). 
Which reminds me to write a script to remove the digits from the file names of my mp3 collection.