Greetings!
I'm trying to find strings with a particular word(s).
Here are the words:
BG13TPPVxxxx -- where xxxx are the 4 digits
BG13TPPVxxxx(B or C) -- and the letters "B" or "C" at the end
I got this regex but it is failing, it also picks up words with SPPV.
Like this:
BG13SPPVxxxx
if re.search('^[a-zA-Z]{2}\d{2}TPPV\d{4}\.|[b|C]\.',x)
Thank you!
This should work.
>>> import re
>>>
>>> s = 'BG13TPPV1234B'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
'BG13TPPV1234B'
>>>
>>> s = 'BG13TPPV9999C'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
'BG13TPPV9999C'
>>>
>>> s = 'BG13SPPV1245B'
>>> r = re.search(r"\w+\d{2}TPPV\d+[BC]", s)
>>> r.group()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
I never thought about it as three different search words!
Let's I'll try it...
Thank you!
This kind of works.
import re
text = ".BG13TPPV123..BG13TPPV2345...BG13TPPV3456B...BG13TPPV4567C...BG13TPPV7890D"
print(re.findall(r"\w+\d{2}TPPV\d{4}[BC]?", text))
Output:
['BG13TPPV2345', 'BG13TPPV3456B', 'BG13TPPV4567C', 'BG13TPPV7890']
Notice that it matches part of "BG13TPPV7890D" because "BG13TPPV7890" is a match to the pattern.
A stricter match is possible if we are willing to specify the character that follows the string. This pattern says the string must be followed by something that is not normally part of a word (whitespace, punctuation).
import re
text = ".BG13TPPV123..BG13TPPV2345...BG13TPPV3456B...BG13TPPV4567C...BG13TPPV7890D."
print(re.findall(r"(\w+\d{2}TPPV\d{4}[BC]?)\W", text))
Output:
['BG13TPPV2345', 'BG13TPPV3456B', 'BG13TPPV4567C']