Python Forum

Full Version: Failing regex
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Greetings!
I'm trying to find strings with a particular word(s).
Here are the words:
BG13TPPVxxxx -- where xxxx are the 4 digits
BG13TPPVxxxx(B or C) -- and the letters "B" or "C" at the end

I got this regex but it is failing, it also picks up words with SPPV.
Like this:
BG13SPPVxxxx

if re.search('^[a-zA-Z]{2}\d{2}TPPV\d{4}\.|[b|C]\.',x) 
Thank you!
This should work.
>>> import re
>>> 
>>> s = 'BG13TPPV1234B'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
'BG13TPPV1234B'
>>> 
>>> s = 'BG13TPPV9999C'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
'BG13TPPV9999C'
>>> 
>>> s = 'BG13SPPV1245B'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
I never thought about it as three different search words!
Let's I'll try it...

Thank you!
This kind of works.
import re

text = ".BG13TPPV123..BG13TPPV2345...BG13TPPV3456B...BG13TPPV4567C...BG13TPPV7890D"

print(re.findall(r"\w+\d{2}TPPV\d{4}[BC]?", text))
Output:
['BG13TPPV2345', 'BG13TPPV3456B', 'BG13TPPV4567C', 'BG13TPPV7890']
Notice that it matches part of "BG13TPPV7890D" because "BG13TPPV7890" is a match to the pattern.

A stricter match is possible if we are willing to specify the character that follows the string. This pattern says the string must be followed by something that is not normally part of a word (whitespace, punctuation).
import re

text = ".BG13TPPV123..BG13TPPV2345...BG13TPPV3456B...BG13TPPV4567C...BG13TPPV7890D."

print(re.findall(r"(\w+\d{2}TPPV\d{4}[BC]?)\W", text))
Output:
['BG13TPPV2345', 'BG13TPPV3456B', 'BG13TPPV4567C']