I am making a parser combinator... for the third time. This time I think it can actually do cool stuff like parsing indentation based grammars but the way of using it is too ugly.
I want to add some syntax sugar to it so it dont look as donting, but I am not finding many ways of doing it. I have the following primitives so far, all of which are Parser subclasses and can be combined in arbitrary ways:
I already know that I can use some of the __magic__ methods to implement some operator overloads like:
What do you think would be good ways to get rid of this weird looking nested structure using some __magic__ methods?
I want to add some syntax sugar to it so it dont look as donting, but I am not finding many ways of doing it. I have the following primitives so far, all of which are Parser subclasses and can be combined in arbitrary ways:
String(text) Regex(text) Tag(parser) Sequence(*parsers) Either(*parsers) Many(parser, min, max) SepBy(parser, sep_parser, min, max) Concatenate(*parsers) AnyIndent(indent_parser, skip) SameIndent(indent_parser, skip) IndentIncreased(indent_parser, skip) IndentDecreased(indent_parser, skip)And with those classes I can do a lot of stuff but its ugly as hell and looks too cryptic with all those nested classes instances.
I already know that I can use some of the __magic__ methods to implement some operator overloads like:
parser_a + parser_b
becomes Sequence(parser_a, parser_b)
and parser_a | parser_b
becomes Either(parser_a, parser_b)
, parser_a + ...
becomes Many(parser_a)
so on, but some constructs are not obvious like SepBy(parser_a, sep_parser, min=1)
, is there even a way to represent this in a simple "sweet" expression.What do you think would be good ways to get rid of this weird looking nested structure using some __magic__ methods?
pseudo_indent = Regex(r'\s*\n(?=[ \t]*[^\s])') indent = Regex(r'[ \t]*') any = AnyIndent(indent, skip=pseudo_indent).discard() same = SameIndent(indent, skip=pseudo_indent).discard() incr = IndentIncreased(indent, skip=pseudo_indent).discard() decr = IndentDecreased(indent, skip=pseudo_indent).discard() r = RecursionContainer() valid_val = Either('val', r.indent_parser_test) r.indent_parser_test = Concatenate('grp', incr, valid_val, Many(Sequence(same, valid_val,).capture(0))) text = '''\ grp grp val grp val val val val val ''' result = Sequence(any, r.indent_parser_test).capture(0).parse(text) print(result) # prints ['grp', ['grp', 'val', ['grp', 'val', 'val'], 'val'], 'val', 'val']