Python Forum
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex
#1
Hi,

I am trying to achieve a non-greedy Regex match.

I am writing

n = re.compile(r'(ha){3,5}?man')
mo = n.search('hahahahahaman')
mo.group()
but I don't understand why mo.group() gives hahahahahaman.

I was expecting it to give hahahaman because I have put ? for non greedy match
Reply
#2
I was confused by this as well, but it appears that 'non-greedy' means 'it stops as soon as possible.' The stop here is 'man', so it starts at the beginning, and stops as soon as it gets to the 'man'. That includes 5 ha's. If you reverse things:

n = re.compile(r'man(ha){3,5}?')
mo = n.search('manhahahahaha')
print(mo.group())
It matches 'manhahaha', stopping at three.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
sorry, but your response does not make any sense.
I don't understand when you say 'stops as soon as possible'.
How would it know when to stop?
Reply
#4
It collects 'ha's until it gets to a 'man'. It will collect 3 to 5 'ha's. It will stop collecting 'ha's as soon as possible.
  • [ha]hahahahaman (one ha, needs at least three)
  • [haha]hahahaman (two ha's, needs at least three)
  • [hahaha]hahaman (three ha's, has enough ha's, wants to stop, but can't because the next characters aren't man)
  • [hahahaha]haman (four ha's, still can't stop)
  • [hahahahaha]man (five ha's, can stop looking for has because has 3-5 and a man coming next)
  • [hahahahahaman] (pattern matched, stops looking for patterns)
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#5
ok, kind of get it, thank you.
Reply
#6
The problem here is that you're using re.group, when you should probably be using re.groups.  group (singular) returns the entire source string that contains a group if you don't pass an index to which subgroup you want it to return, while groups returns... the matched subgroups.

>>> needle = r"(ha){3,5}man"
>>> haystack = "hahahahahahaman"
>>> match = re.search(needle, haystack)
>>> match.group()
'hahahahahaman'
>>> match.group(1)
'ha'
>>> match.groups()
('ha',)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020