Hi,
I am trying to achieve a non-greedy Regex match.
I am writing
n = re.compile(r'(ha){3,5}?man')
mo = n.search('hahahahahaman')
mo.group()
but I don't understand why mo.group() gives hahahahahaman.
I was expecting it to give hahahaman because I have put ? for non greedy match
I was confused by this as well, but it appears that 'non-greedy' means 'it stops as soon as possible.' The stop here is 'man', so it starts at the beginning, and stops as soon as it gets to the 'man'. That includes 5 ha's. If you reverse things:
n = re.compile(r'man(ha){3,5}?')
mo = n.search('manhahahahaha')
print(mo.group())
It matches 'manhahaha', stopping at three.
sorry, but your response does not make any sense.
I don't understand when you say 'stops as soon as possible'.
How would it know when to stop?
It collects 'ha's until it gets to a 'man'. It will collect 3 to 5 'ha's. It will stop collecting 'ha's as soon as possible.
- [ha]hahahahaman (one ha, needs at least three)
- [haha]hahahaman (two ha's, needs at least three)
- [hahaha]hahaman (three ha's, has enough ha's, wants to stop, but can't because the next characters aren't man)
- [hahahaha]haman (four ha's, still can't stop)
- [hahahahaha]man (five ha's, can stop looking for has because has 3-5 and a man coming next)
- [hahahahahaman] (pattern matched, stops looking for patterns)
ok, kind of get it, thank you.
The problem here is that you're using
re.group
, when you should probably be using
re.groups
. group (singular) returns the entire source string that contains a group if you don't pass an index to which subgroup you want it to return, while groups returns... the matched subgroups.
>>> needle = r"(ha){3,5}man"
>>> haystack = "hahahahahahaman"
>>> match = re.search(needle, haystack)
>>> match.group()
'hahahahahaman'
>>> match.group(1)
'ha'
>>> match.groups()
('ha',)