Python Forum
splitting a string by 2 characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
splitting a string by 2 characters
#1
i want to split a string at each place where 2 different characters might appear.

'some text<some tag>more text'.split2('<','>')

--> ['some text','some tag','more text']

yes, this example is splitting HTML, so that may be a bad example.  consider this outside of the scope of HTML so your thinking does not converge on HTMLParser.  Maybe it would be nice if str.split could be upgraded to do this like:

'some text<some tag>more text'.split(['<','>'])
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Any reason why regex wouldn't work?
re.split('[<>]', your_string)
Obviously if HTML is an important part of this, you need to consider things carefully.
Reply
#3
Here! I am using two commas to replacing each of < > because it's quit unlikely to see it in any text. Then split it by ',,'

>>> s = 'some text<some tag>more text'.translate({60: ',,', 62: ',,'}).split(',,')
>>> s
['some text', 'some tag', 'more text']
>>> 
And it's faster as I know
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#4
(Dec-26-2016, 06:12 AM)micseydel Wrote: Any reason why regex wouldn't work?
i've had cases where regex has failed so i just generally avoid it anymore.  i did not evaluate it for this.

(Dec-26-2016, 06:12 AM)micseydel Wrote: Obviously if HTML is an important part of this, you need to consider things carefully.
i had a case which did involve HTML and thinking about it led me to HTMLParser.  this also left me with the thought about the 2 character splitting.

(Dec-26-2016, 07:01 AM)wavic Wrote: Here! I am using two commas to replacing each of < > because it's quit unlikely to see it in any text. Then split it by ',,'

>>> s = 'some text<some tag>more text'.translate({60: ',,', 62: ',,'}).split(',,')
>>> s
['some text', 'some tag', 'more text']
>>> 
And it's faster as I know

so why not:
'some text<some tag>more text'.translate({62: '<'}).split('<')
?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
It could be  Big Grin It's a kind of optimization
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
well... .translate() does not work for this under python 2 (2.7.12).  it's there, but:

Output:
lt1/forums /home/forums 9> py2 Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 'some text<some tag>more text'.translate({62: '<'}).split('<') Traceback (most recent call last):   File "<stdin>", line 1, in <module> TypeError: expected a string or other character buffer object >>> lt1/forums /home/forums 10>
ok, so how about:

Output:
lt1/forums /home/forums 10> py2 Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> '>'.join('some text<some tag>more text'.split('<')).split('>') ['some text', 'some tag', 'more text'] >>> lt1/forums /home/forums 11>
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
Translate() works in different way in python 2.7. You have to create a translation table.
>>> from string import maketrans
>>> s = 'some text<some tag>more text'.translate(maketrans('>', '<')).split('<')
>>> s
['some text', 'some tag', 'more text']
Maketrans() in python 2.7 works with the ascii set of characters. It makes translation table for all of these even if you want to translate only one character to another. Then it's passing it to the translate() method. In Python 3 because of its native utf-8 support translate gets the translation table as a simple dictionary with the numerical value of each utf-8 character as a key.

Joining and splitting works too :) It didn't cross my mind
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#8
a translate table for utf-8 would have its difficulties.  even for unicode it would at least be very large.  i can see the pluses of using a dictionary.  BTDT in C (used an AVL mapping i wrote).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#9
maketrans() is more convenient for me because you pass as parameters only two sets of characters and the third one for those to delete. In Python 3 way you have to create the dict by your own. I don't use  translate often otherwise I'd write something like maketrans(). 
Which reminds me to write a script to remove the digits from the file names of my mp3 collection.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  doing string split with 2 or more split characters Skaperen 22 2,487 Aug-13-2023, 01:57 AM
Last Post: Skaperen
  splitting file into multiple files by searching for string AlphaInc 2 890 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  How do I check if the first X characters of a string are numbers? FirstBornAlbratross 6 1,521 Apr-12-2023, 10:39 AM
Last Post: jefsummers
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,204 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas
  Extract continuous numeric characters from a string in Python Robotguy 2 2,631 Jan-16-2021, 12:44 AM
Last Post: snippsat
  Python win32api keybd_event: How do I input a string of characters? JaneTan 3 3,804 Oct-19-2020, 04:16 AM
Last Post: deanhystad
  How to get first two characters in a string scratchmyhead 2 2,086 May-19-2020, 11:00 AM
Last Post: scratchmyhead
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,683 May-15-2020, 01:37 PM
Last Post: snippsat
  Splitting a string twice bazcurtis 2 5,535 Mar-09-2020, 02:54 PM
Last Post: perfringo
  Replacing characters in a string with a list cjms981 1 1,816 Dec-30-2019, 10:50 PM
Last Post: micseydel

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020