Python Forum

Full Version: Finding string in list item
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello.

This is odd but I'm sure there is a perfectly good explanation.

Why is "501" and "502" evaluated as true for each item in the list?

my_list = ["PT300-XXXX", "PB300-XXXX","PB501-XXXX", "PB102-XXXX","AL300-XXXX","BD502-XXXX"]

for node in my_list:
    if '501' or '502'  in node:
        print ("HIT")
        continue 
    print (node) 
OUTPUT:
HIT
HIT
HIT
HIT
HIT
HIT
>

If I look for each string individually it works.

my_list = ["PT300-XXXX", "PB300-XXXX","PB501-XXXX", "PB102-XXXX","AL300-XXXX","BD502-XXXX"]

for node in my_list:
    if '501' in node:
        continue 
    if '502' in node:
        continue
    print (node) 
OUTPUT:
PT300-XXXX
PB300-XXXX
PB102-XXXX
AL300-XXXX
>

Thanks
This is an interesting puzzle.

or is a logical operator. "a or b" returns a if a is true-ey, else it returns b. Most objects in Python are true-ey. False-ey Python objects are: False, None, 0, blank strings and empty collections.

Look what happens when just look at the body of the if:
print("501" or "502" in "PT300-XXXX")
Output:
501
So when I use a or b in c it appears that the "b in c" part is ignored if a is true-ey. To help understand why, I wrote a function that I could disassemble.

test.py
def myfunc():
    return "501" or "502" in "PT300-XXXX"
Output:
>>> import dis, test >>>dis.dis(test.myfunc) 2 0 LOAD_CONST 1 ('501') 2 JUMP_IF_TRUE_OR_POP 10 4 LOAD_CONST 2 ('502') 6 LOAD_CONST 3 ('PT300-XXXX') 8 CONTAINS_OP 0 >> 10 RETURN_VALUE
The first thing the function does is load '501'. If '501' is TRUE it jumps directly to line 10 which returns the "501". This is why your if statement always prints "HIT".

More interesting is what happens if '501' is false-ey. If this happens the program loads '502' and 'PT300-XXXX' and tests if '502' is contained in 'PT300-XXXX'. This says that the if statement is evaluated like this:
if "501" or ("502" in node):
The "in" takes precedence over the "or". If we think of the if statement as "if a or b:", a = "501" and b = "502" in node.

To test the accuracy of this claim I replaced "501" with something false-ey. Now the program should only print "HIT" when node contains "502".
my_list = [
    "PT300-XXXX",
    "PB300-XXXX",
    "PB501-XXXX",
    "PB102-XXXX",
    "AL300-XXXX",
    "BD502-XXXX",
]

for node in my_list:
    if "" or "502" in node:
        print("HIT")
        continue
    print(node)
Output:
PT300-XXXX PB300-XXXX PB501-XXXX PB102-XXXX AL300-XXXX HIT
Yes, I just noticed this.

my_list = ["PT300-XXXX", "PB300-XXXX","PB501-XXXX", "PB102-XXXX","AL300-XXXX","BD502-XXXX"]

for node in my_list:
    if ('501' or '502')  in node:
        print ("HIT")
        continue 
    print (node) 
OUTPUT:
PT300-XXXX
PB300-XXXX
HIT
PB102-XXXX
AL300-XXXX
BD502-XXXX
>
Found it. Have to explicitly specify the list item for each value.

my_list = ["PT300-XXXX", "PB300-XXXX","PB501-XXXX", "PB102-XXXX","AL300-XXXX","BD502-XXXX"]

for node in my_list:
    if ('501'in node or '502'in node):
        print ("HIT")
        continue 
    print (node) 
OUTPUT:
PT300-XXXX
PB300-XXXX
HIT
PB102-XXXX
AL300-XXXX
HIT
>
You could also use "any".
my_list = [
    "PT300-XXXX",
    "PB300-XXXX",
    "PB501-XXXX",
    "PB102-XXXX",
    "AL300-XXXX",
    "BD502-XXXX",
]

for node in my_list:
    if any(x in node for x in ("501", "502")):
        print("HIT")
        continue
    print(node)
Since you only have two substrings to test it is clearer to just do the two tests. If there were many substrings I would use "any".
Thanks for all of the info, still pretty confusing, it looks like the answer is it just doesn't work that way.

So if it's evaluating an empty string it just ignores it and moves to the OR

print("" or "502" in "PT999-XXXX") # = FALSE
print("" or "502" in "PT502-XXXX") # = TRUE
print("501" or "502" in "PT502-XXXX") # = 501

It's not clear why in the 3rd example 501 is true(?)

Like you stated when you tested the function:

"The first thing the function does is load '501'. If '501' is TRUE it jumps directly to line 10 which is the return value of the function. This is why your if statement is always true"

The rest of the statement is just ignored.

"for Comparison = for this to work normally either condition needs to be true. The compiler checks the first condition first and if that turns out to be true, the compiler runs the assigned code and the second condition is not evaluated. If the first condition turns out to be false, the compiler checks the second, if that is true the assigned code runs but if that fails too, false is returned to the if statement."

Yeah, why is the first condition TRUE?
This always prints "HIT" because the expression "501" or "502" in node always returns "501". Remember that "a or b" returns a if a is true-ey, else it returns b. In your expression a = "501" and b = ("502" in node). "501" is not an empty string, so it is true-ey.
if "501" or ("502" in node):
    print("HIT")
I can't remember where now, but I had a similar problem. Just need to think like a Python interpreter:

for node in my_list:
    if '501' in node or '502' in node:
        print ("HIT")
        continue 
    print (node) 
Also
import re

if re.search(r'50[12]', node):
    print('HIT')