Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 REGEX Look Arounds
#1
Hi, I am trying to learn and understand look arounds, In the code below why is '1' removed and not !123!

a = "learn@123Python456"
re.findall(r"\d+", a)   #['123', '456']
re.findall(r"(?<!\W)\d+", a)   #['23', '456']

while if I use a positive look behind such as:

b = "@@@coding????isfun"
re.findall(r"\w+", b)   #['coding', 'isfun']
re.findall(r"(?<=\W)\w+", b)   #['coding', 'isfun']
All the characters are retained

I was using IDLE to run the code

Actually this is a better example of a positive look behind

b = "@@@coding  isfun"
re.findall(r"\w+", b) #['coding', 'isfun']
re.findall(r"(?<=\s)\w+", b) #['isfun']
any assistance will be appreciated, thanks
Quote
#2
In your third example, matches must start with an (uncaptured) whitespace. So "coding" is ineligible. Only after the whitespace are matches possible, so "isfun" is returned.

In your first example, matches must not start immediately following a "non-word" character. The first possible digit to capture in the string is "1", but that digit does follow a non-word character. So that position is passed. The next possible match starts with "2". Since that is eligible (it follows a "1" which is not part of \W), the first match group begins there.
Quote

Top Page

Forum Jump:


Users browsing this thread: 1 Guest(s)