REGEX Look Arounds

CharlesKL · (This post was last modified: May-23-2020, 08:17 PM by CharlesKL.)

Hi, I am trying to learn and understand look arounds, In the code below why is '1' removed and not !123!

a = "learn@123Python456"
re.findall(r"\d+", a)   #['123', '456']
re.findall(r"(?<!\W)\d+", a)   #['23', '456']

while if I use a positive look behind such as:

b = "@@@coding????isfun"
re.findall(r"\w+", b)   #['coding', 'isfun']
re.findall(r"(?<=\W)\w+", b)   #['coding', 'isfun']

All the characters are retained

I was using IDLE to run the code

Actually this is a better example of a positive look behind

b = "@@@coding  isfun"
re.findall(r"\w+", b) #['coding', 'isfun']
re.findall(r"(?<=\s)\w+", b) #['isfun']

any assistance will be appreciated, thanks

bowlofred · May-23-2020, 10:50 PM

In your third example, matches must start with an (uncaptured) whitespace. So "coding" is ineligible. Only after the whitespace are matches possible, so "isfun" is returned.

In your first example, matches must not start immediately following a "non-word" character. The first possible digit to capture in the string is "1", but that digit does follow a non-word character. So that position is passed. The next possible match starts with "2". Since that is eligible (it follows a "1" which is not part of \W), the first match group begins there.

CharlesKL · May-26-2020, 12:30 PM

Thank you, a bit confusing but now I understand

REGEX Look Arounds

User Panel Messages

Announcements