![]() |
Recursive regular expressions in Python - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Recursive regular expressions in Python (/thread-40419.html) |
Recursive regular expressions in Python - risu252 - Jul-24-2023 Hi! Today I ran into a problem where I had to extract a JSON file passed in a YAML file within a string (don't ask me why LOL). Curly braces indicate a parameter that should be captured, so the string Variable {var} should have the value { {"test": {"a": 1}} } [ should, let's say, add that value as an embedded object in a Python dictionary. While doing some research on how to come up with a regex that could match balanced curly braces, I ran into this one: ((?:[^()]|(?R))*\) , which is supposed to match balanced parentheses. The way I see it, every time we encounter either an opening or a closing parenthesis, the recursive call will be made, and the regex will be evaluated from the beginning (it will get back to trying to match an opening parenthesis at \() . However, if we start a recursive call for when we find a closing parenthesis as well, that recursive call will not match, because it won't start with an opening parenthesis. It doesn't make sense to me. What would make sense to me would be if we didn't have that closing parenthesis inside the [^()] character group, but then the regex does not capture balanced parentheses only. Can someone please help me understand why?P.S.: I'm using Python's regex module.
RE: Recursive regular expressions in Python - deanhystad - Jul-25-2023 I'm pretty sure you saw the explanation here: https://stackoverflow.com/questions/26385984/recursive-pattern-in-regex I thought it was pretty good. RE: Recursive regular expressions in Python - risu252 - Jul-25-2023 (Jul-25-2023, 11:07 AM)deanhystad Wrote: I'm pretty sure you saw the explanation here: Hi! Yes, I came across that at some point. However, what I don’t understand is why we need the closing bracket in the character group. Why can’t the regex be /{(?:[^{]|(?R)*)/} ? Aren’t we starting a recursive call every time we encounter a closing bracket as well? Why is that desired? Shouldn’t we start a recursive call only when we find an opening bracket and expect it to be matched by the \} part of the regex? I can’t understand why the regex with the closing bracket in the character group does not lead to infinite recursion, since, when we encounter the closing bracket, we make a recursive call, and that closing bracket is not matched again, leading to a new recursive call and so on. What am I missing here?Thanks a lot for replying. |