![]() |
Adding some __magic__ to my parser combinator. - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Adding some __magic__ to my parser combinator. (/thread-26200.html) |
Adding some __magic__ to my parser combinator. - jeacom - Apr-23-2020 I am making a parser combinator... for the third time. This time I think it can actually do cool stuff like parsing indentation based grammars but the way of using it is too ugly. I want to add some syntax sugar to it so it dont look as donting, but I am not finding many ways of doing it. I have the following primitives so far, all of which are Parser subclasses and can be combined in arbitrary ways: String(text) Regex(text) Tag(parser) Sequence(*parsers) Either(*parsers) Many(parser, min, max) SepBy(parser, sep_parser, min, max) Concatenate(*parsers) AnyIndent(indent_parser, skip) SameIndent(indent_parser, skip) IndentIncreased(indent_parser, skip) IndentDecreased(indent_parser, skip)And with those classes I can do a lot of stuff but its ugly as hell and looks too cryptic with all those nested classes instances. I already know that I can use some of the __magic__ methods to implement some operator overloads like: parser_a + parser_b becomes Sequence(parser_a, parser_b) and parser_a | parser_b becomes Either(parser_a, parser_b) , parser_a + ... becomes Many(parser_a) so on, but some constructs are not obvious like SepBy(parser_a, sep_parser, min=1) , is there even a way to represent this in a simple "sweet" expression.What do you think would be good ways to get rid of this weird looking nested structure using some __magic__ methods? pseudo_indent = Regex(r'\s*\n(?=[ \t]*[^\s])') indent = Regex(r'[ \t]*') any = AnyIndent(indent, skip=pseudo_indent).discard() same = SameIndent(indent, skip=pseudo_indent).discard() incr = IndentIncreased(indent, skip=pseudo_indent).discard() decr = IndentDecreased(indent, skip=pseudo_indent).discard() r = RecursionContainer() valid_val = Either('val', r.indent_parser_test) r.indent_parser_test = Concatenate('grp', incr, valid_val, Many(Sequence(same, valid_val,).capture(0))) text = '''\ grp grp val grp val val val val val ''' result = Sequence(any, r.indent_parser_test).capture(0).parse(text) print(result) # prints ['grp', ['grp', 'val', ['grp', 'val', 'val'], 'val'], 'val', 'val'] |