Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
REGEX Look Arounds
#1
Hi, I am trying to learn and understand look arounds, In the code below why is '1' removed and not !123!

a = "learn@123Python456"
re.findall(r"\d+", a)   #['123', '456']
re.findall(r"(?<!\W)\d+", a)   #['23', '456']
while if I use a positive look behind such as:

b = "@@@coding????isfun"
re.findall(r"\w+", b)   #['coding', 'isfun']
re.findall(r"(?<=\W)\w+", b)   #['coding', 'isfun']
All the characters are retained

I was using IDLE to run the code

Actually this is a better example of a positive look behind

b = "@@@coding  isfun"
re.findall(r"\w+", b) #['coding', 'isfun']
re.findall(r"(?<=\s)\w+", b) #['isfun']
any assistance will be appreciated, thanks
Reply
#2
In your third example, matches must start with an (uncaptured) whitespace. So "coding" is ineligible. Only after the whitespace are matches possible, so "isfun" is returned.

In your first example, matches must not start immediately following a "non-word" character. The first possible digit to capture in the string is "1", but that digit does follow a non-word character. So that position is passed. The next possible match starts with "2". Since that is eligible (it follows a "1" which is not part of \W), the first match group begins there.
Reply
#3
Thank you, a bit confusing but now I understand
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020