Python Forum

Pages: 1 2

i have a string (str,bytes,bytearray) and an object with one or more characters (str,bytes,bytearray,set,frozenset,list,tuple). the beginning of the string has some number of characters that would get a True result if the in operator is used with that object while stepping through the string until it reaches a character that would get False. as this is hard to explain, here is some code:

def runlen(s,o):
    for n in range(len(s)):
        if s[n] in o:
            continue
        return n

what i would like to know is if there is a way to call something to do that loop internally so it would run faster. maybe re can do this, somehow. once i have the position n, i will be using s[:n] though not s[n:]. so, something that just gives me the prefix would do the job.

Yes, you could do a regex. If your string were supercalifragilistic and your set of OK characters were lrepscau, then the first failure should be the "i" in position number 8.

>>> import re
>>> s = "supercalifragilistic"
>>> chars = "lrepscau"
>>> re.match(fr"[{chars}]*", s)
<_sre.SRE_Match object; span=(0, 8), match='supercal'>

It matched from (0, 8), so position #8 did not match. Or you could do the inverse character class and find the first match:

>>> re.search(fr"[^{chars}]", s)
<_sre.SRE_Match object; span=(8, 9), match='i'>

You can ask for the span() of any successful match.

>>> re.search(fr"[^{chars}]", s).span()
(8, 9)

can you code it like a function replacing the one i posted (e.g. with no literals)?

There are no literals. I just used "chars" in place of your "o".

You'd need to add in some conditional to handle the cases when the match doesn't succeed, and you might need to convert your object with characters to a string. But after that, the match is good.

def runlen(s, o):
    return re.search(fr"[^{o}]", s).span()[0]

sometimes, the caller is working in bytes, or in bytearray, and passes those in. at least the same is not returned for this, since it is specifically int. there could be cases where the whole string is in the (class defined by the) object, which will often be a set (and may have ints instead of 1-bytes).

You can use the __contains__ dunder method

from itertools import takewhile
from more_itertools import ilen

def runlen(s, o):
    return ilen(takewhile(o.__contains__, s))

I don't know if it will run faster however.

tell me more about __contains__. is it checking multiple characters?

(May-31-2020, 03:30 AM)bowlofred Wrote: [ -> ]There are no literals. I just used "chars" in place of your "o".

but the literals leave me guessing what is what. use another variable name if you wish. just have the prototype after "def" with names for the string and the object a character can be "in", and use those variable names in the code.

Skaperen Wrote:tell me more about __contains__. is it checking multiple characters?

There is not much to say about it. o.__contains__(x) has the same value as x in o.

why would anyone use __contains__ is in gives the same value?

Because I can use o.__contains__ as a pointer to function, which I cannot do with the in keyword. For example I can write takewhile(o.__contains__, s)

Pages: 1 2

Skaperen

bowlofred

Skaperen

bowlofred

Skaperen

Gribouillis

Skaperen

Gribouillis

Skaperen

Gribouillis