Python Forum

Full Version: Regex to find triple characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I am looking for a regex pattern to find the instance of 3 matching characters in series in a string. The characters can be letters or numbers, but can only be 3 characters in length and all be the same character. For example, the string 'ab999thc7' would result in '999' being found, but 'abddddthc7' would not be valid due to the repeated character, 'd', being a length of 4. Any assistance would be greatly appreciated.
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']
This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>> 
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
This seems to work, although it contains 3 times opening bracket, (, but only 2 times closing bracket, ), which is weird!

string = 'AAABBCCCDDEEEFFGGGHHIIIJJKKK'
result = [match[0] for match in re.findall(r'((\w)\2{2,})', string)]
(May-14-2024, 05:59 AM)deanhystad Wrote: [ -> ]
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']

Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.
(May-14-2024, 08:14 AM)Gribouillis Wrote: [ -> ]This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>> 
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
Thank you, this solution does work but there is a compilation error stating that groups are not supported in lookbehinds. Any way to clear that up. Appreciate it.
Thank you, but this solution allows consecutive characters of more than 3 to slip through. I need the expression to specifically look for exactly 3 consecutive characters.
(May-14-2024, 12:29 PM)bfallert Wrote: [ -> ]there is a compilation error stating that groups are not supported in lookbehinds.
Which version of Python are you using? It works fine here in Python 3.10. The latest Python is 3.12 as of may 2024. Groups are allowed in lookbehind assertions since Python 3.5 (2015).
Quote:I am looking for a regex pattern to find the instance of 3 matching characters in series in a string.

This finds all instances of "3 matching characters in series", as stated above.

Is that NOT what you want?

string = 'AAAABBBBCCCC11212222333444455555666666'
result = [match[0] for match in re.findall(r'((\w)\2{2,2})', string)]
Output:
result ['AAA', 'BBB', 'CCC', '222', '333', '444', '555', '666', '666']
(May-14-2024, 12:27 PM)bfallert Wrote: [ -> ]
(May-14-2024, 05:59 AM)deanhystad Wrote: [ -> ]
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']

Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.

Change the repeat count from {1,} to {2}.
Pages: 1 2