Python Forum

Full Version: substring between substrings
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i am looking for an existing function or i will code my own. given a main string and two substrings A and B ... if A and B are both in the main string and B is found after the end of A, return the substring found between A and B. if B is immediately after A then return an empty string. a way to also do this with byte strings is a big plus, even if it is a different function name. note that A and B can be strings longer than a single character and that A and B may be the same or may be different.

m = 'happy birthday to you'
a = 'happy '
b = ' to'
betweenstr(m,a,b) -> 'birthday'
a = 'b'
b = 'y'
betweenstr(m,a,b) -> 'irthda'
a = 'birth'
b = 'day'
betweenstr(m,a,b) -> ''
a = 'day'
b = 'happy'
betweenstr(m,a,b) -> None or an exception
i am not asking anyone to code this for me; i can do that. i am only interested in something in the Python library or i code on my own, not somehing to be installed.
I would use regular expressions, like re.compile('(happy )(.*)( to)'). You could easily make a function that generates one for any two strings, and then searches a third string for matches.
how complicated does this get if A or B has characters that need to be escaped for re?
>>> m = 'happy birthday to you'
>>> a = 'happy'
>>> b = ' to'
>>> start = m[m.find(a):].lstrip(a)
>>> start
' birthday to you'
>>> start[:start.find(b)]
' birthday'
>>> a = 'birth'
>>> b = 'day'
>>> start = m[m.find(a):].lstrip(a)
>>> start
'day to you'
>>> start[:start.find(b)]
''
i can do this without re. i already have. i'm just wondering what all it would take to do it with re. my worry is that strings could have regular expression characters in them so they would need processing to escape those meta characters. i wonder how much code that would involve. here is what i coded:

def between(m,a,b):
    if 'find' not in dir(m):
        return False
    if not isinstance(a,type(m)):
        return False
    if not isinstance(b,type(m)):
        return False
    p = m.find(a)
    if p<0:
        return None
    p += len(a)
    q = m.find(b,p)
    if q<0:
        return None
    return m[p:q]
i would like to make a version of this that works to find the last instance. that is, if A an B appear in the main string more than once, i can get the last one. or maybe a generic verion that gets the Nth instance with negative indexes to count from the end.
The re module comes with a function specifically for that: re.escape.