Feb-19-2021, 05:16 PM
When you set up a capturing regex, it numbers the capturing parentheses from left to right. So in a pattern like this:
It's not the pipe character, it's the parentheses.
Even if only one can match, they're still numbered and set from left to right. So the group you get back is a tuple with all the capture groups. To find what's in there, you can either loop through the elements of the tuple, or you can rewrite the regex so there's only one (or zero) capture groups.
If the parenthesis starts with
>>> re.findall(r"(\w+\d+\w+)|(\d+\w+\d+)", "AAA1AAA") [('AAA1AAA', '')]you get a tuple with each element being the capture from each capture group.
It's not the pipe character, it's the parentheses.
Even if only one can match, they're still numbered and set from left to right. So the group you get back is a tuple with all the capture groups. To find what's in there, you can either loop through the elements of the tuple, or you can rewrite the regex so there's only one (or zero) capture groups.
If the parenthesis starts with
?:
, then it won't be a capture group. That allows the pattern match to go back to "the entire pattern" and you don't have a tuple any longer.>>> re.findall(r"(?:\w+\d+\w+)|(?:\d+\w+\d+)", "AAA1AAA") ['AAA1AAA'] >>> re.findall(r"(?:\w+\d+\w+)|(?:\d+\w+\d+)", "111A111") ['111A111']