Posts: 9
Threads: 2
Joined: Apr 2024
I am looking for a regex pattern to find the instance of 3 matching characters in series in a string. The characters can be letters or numbers, but can only be 3 characters in length and all be the same character. For example, the string 'ab999thc7' would result in '999' being found, but 'abddddthc7' would not be valid due to the repeated character, 'd', being a length of 4. Any assistance would be greatly appreciated.
Posts: 6,778
Threads: 20
Joined: Feb 2020
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches) Output: ['AAA', 'bb', 'DDD', 'GGGG']
Posts: 4,780
Threads: 76
Joined: Jan 2018
This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>>
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
« We can solve any problem by introducing an extra level of indirection »
Posts: 1,090
Threads: 143
Joined: Jul 2017
This seems to work, although it contains 3 times opening bracket, (, but only 2 times closing bracket, ), which is weird!
string = 'AAABBCCCDDEEEFFGGGHHIIIJJKKK'
result = [match[0] for match in re.findall(r'((\w)\2{2,})', string)]
Posts: 9
Threads: 2
Joined: Apr 2024
(May-14-2024, 05:59 AM)deanhystad Wrote: import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches) Output: ['AAA', 'bb', 'DDD', 'GGGG']
Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.
Posts: 9
Threads: 2
Joined: Apr 2024
(May-14-2024, 08:14 AM)Gribouillis Wrote: This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>>
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD'] Thank you, this solution does work but there is a compilation error stating that groups are not supported in lookbehinds. Any way to clear that up. Appreciate it.
Posts: 9
Threads: 2
Joined: Apr 2024
Thank you, but this solution allows consecutive characters of more than 3 to slip through. I need the expression to specifically look for exactly 3 consecutive characters.
Posts: 4,780
Threads: 76
Joined: Jan 2018
May-14-2024, 12:51 PM
(This post was last modified: May-14-2024, 12:53 PM by Gribouillis.)
(May-14-2024, 12:29 PM)bfallert Wrote: there is a compilation error stating that groups are not supported in lookbehinds. Which version of Python are you using? It works fine here in Python 3.10. The latest Python is 3.12 as of may 2024. Groups are allowed in lookbehind assertions since Python 3.5 (2015).
« We can solve any problem by introducing an extra level of indirection »
Posts: 1,090
Threads: 143
Joined: Jul 2017
May-14-2024, 01:04 PM
(This post was last modified: May-14-2024, 01:06 PM by Pedroski55.)
Quote:I am looking for a regex pattern to find the instance of 3 matching characters in series in a string.
This finds all instances of "3 matching characters in series", as stated above.
Is that NOT what you want?
string = 'AAAABBBBCCCC11212222333444455555666666'
result = [match[0] for match in re.findall(r'((\w)\2{2,2})', string)] Output: result
['AAA', 'BBB', 'CCC', '222', '333', '444', '555', '666', '666']
Posts: 6,778
Threads: 20
Joined: Feb 2020
(May-14-2024, 12:27 PM)bfallert Wrote: (May-14-2024, 05:59 AM)deanhystad Wrote: import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches) Output: ['AAA', 'bb', 'DDD', 'GGGG']
Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.
Change the repeat count from {1,} to {2}.
|