Python Forum

Full Version: Regex
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I am trying to achieve a non-greedy Regex match.

I am writing

n = re.compile(r'(ha){3,5}?man')
mo = n.search('hahahahahaman')
mo.group()
but I don't understand why mo.group() gives hahahahahaman.

I was expecting it to give hahahaman because I have put ? for non greedy match
I was confused by this as well, but it appears that 'non-greedy' means 'it stops as soon as possible.' The stop here is 'man', so it starts at the beginning, and stops as soon as it gets to the 'man'. That includes 5 ha's. If you reverse things:

n = re.compile(r'man(ha){3,5}?')
mo = n.search('manhahahahaha')
print(mo.group())
It matches 'manhahaha', stopping at three.
sorry, but your response does not make any sense.
I don't understand when you say 'stops as soon as possible'.
How would it know when to stop?
It collects 'ha's until it gets to a 'man'. It will collect 3 to 5 'ha's. It will stop collecting 'ha's as soon as possible.
  • [ha]hahahahaman (one ha, needs at least three)
  • [haha]hahahaman (two ha's, needs at least three)
  • [hahaha]hahaman (three ha's, has enough ha's, wants to stop, but can't because the next characters aren't man)
  • [hahahaha]haman (four ha's, still can't stop)
  • [hahahahaha]man (five ha's, can stop looking for has because has 3-5 and a man coming next)
  • [hahahahahaman] (pattern matched, stops looking for patterns)
ok, kind of get it, thank you.
The problem here is that you're using re.group, when you should probably be using re.groups.  group (singular) returns the entire source string that contains a group if you don't pass an index to which subgroup you want it to return, while groups returns... the matched subgroups.

>>> needle = r"(ha){3,5}man"
>>> haystack = "hahahahahahaman"
>>> match = re.search(needle, haystack)
>>> match.group()
'hahahahahaman'
>>> match.group(1)
'ha'
>>> match.groups()
('ha',)