Python Forum
Regex to find triple characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex to find triple characters
#1
I am looking for a regex pattern to find the instance of 3 matching characters in series in a string. The characters can be letters or numbers, but can only be 3 characters in length and all be the same character. For example, the string 'ab999thc7' would result in '999' being found, but 'abddddthc7' would not be valid due to the repeated character, 'd', being a length of 4. Any assistance would be greatly appreciated.
Reply
#2
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']
Reply
#3
This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>> 
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
« We can solve any problem by introducing an extra level of indirection »
Reply
#4
This seems to work, although it contains 3 times opening bracket, (, but only 2 times closing bracket, ), which is weird!

string = 'AAABBCCCDDEEEFFGGGHHIIIJJKKK'
result = [match[0] for match in re.findall(r'((\w)\2{2,})', string)]
Reply
#5
(May-14-2024, 05:59 AM)deanhystad Wrote:
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']

Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.
Reply
#6
(May-14-2024, 08:14 AM)Gribouillis Wrote: This perhaps
>>> import re
>>> p = re.compile(r'(\w)(?<!\1\1)\1\1(?!\1)')
>>> 
>>> [m.group() for m in p.finditer('ab999thc7')]
['999']
>>> [m.group() for m in p.finditer('abddddthc7')]
[]
>>> [m.group() for m in p.finditer("AAAbbcDDDEDGGGG")]
['AAA', 'DDD']
Thank you, this solution does work but there is a compilation error stating that groups are not supported in lookbehinds. Any way to clear that up. Appreciate it.
Reply
#7
Thank you, but this solution allows consecutive characters of more than 3 to slip through. I need the expression to specifically look for exactly 3 consecutive characters.
Reply
#8
(May-14-2024, 12:29 PM)bfallert Wrote: there is a compilation error stating that groups are not supported in lookbehinds.
Which version of Python are you using? It works fine here in Python 3.10. The latest Python is 3.12 as of may 2024. Groups are allowed in lookbehind assertions since Python 3.5 (2015).
« We can solve any problem by introducing an extra level of indirection »
Reply
#9
Quote:I am looking for a regex pattern to find the instance of 3 matching characters in series in a string.

This finds all instances of "3 matching characters in series", as stated above.

Is that NOT what you want?

string = 'AAAABBBBCCCC11212222333444455555666666'
result = [match[0] for match in re.findall(r'((\w)\2{2,2})', string)]
Output:
result ['AAA', 'BBB', 'CCC', '222', '333', '444', '555', '666', '666']
Reply
#10
(May-14-2024, 12:27 PM)bfallert Wrote:
(May-14-2024, 05:59 AM)deanhystad Wrote:
import re
matches =[match.group() for match in re.finditer(r"(.)\1{1,}", "AAAbbcDDDEDGGGG")]
print(matches)
Output:
['AAA', 'bb', 'DDD', 'GGGG']

Thank you, but this expression still allows matching consecutive characters other than exactly 3. Doubles, quads or others beside triples should not be found.

Change the repeat count from {1,} to {2}.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Find numbers using Regex giddyhead 18 3,408 Jul-28-2022, 12:29 AM
Last Post: giddyhead
  Find if chain of characters or number Frankduc 4 1,879 Feb-11-2022, 01:55 PM
Last Post: Frankduc
  Regex not finding all unicode characters tantony 3 2,348 Jul-13-2021, 09:11 PM
Last Post: tantony
  Find and replace in files with regex and Python Melcu54 0 1,900 Jun-03-2021, 09:33 AM
Last Post: Melcu54
  EOF while scanning triple-quoted string literal louis216 1 4,039 Jun-30-2020, 04:11 AM
Last Post: bowlofred
  How to find the first and last of one of several characters in a list of strings? tadsss 2 2,265 Jun-02-2020, 05:23 PM
Last Post: bowlofred
  Remove escape characters / Unicode characters from string DreamingInsanity 5 14,183 May-15-2020, 01:37 PM
Last Post: snippsat
  Find and replace to capitalize with Regex hermobot 2 2,589 Mar-21-2020, 12:30 PM
Last Post: hermobot
  Help converting MATLAB triple-for loop to Python davlovsky 1 2,045 Oct-29-2019, 10:26 PM
Last Post: scidam
  Do I always have to use triple quotes or \n for multi-line statements? DragonG 3 2,696 Oct-24-2018, 11:21 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020