Python Forum

Full Version: Adding some __magic__ to my parser combinator.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am making a parser combinator... for the third time. This time I think it can actually do cool stuff like parsing indentation based grammars but the way of using it is too ugly.
I want to add some syntax sugar to it so it dont look as donting, but I am not finding many ways of doing it. I have the following primitives so far, all of which are Parser subclasses and can be combined in arbitrary ways:

String(text)
Regex(text)
Tag(parser)
Sequence(*parsers)
Either(*parsers)
Many(parser, min, max)
SepBy(parser, sep_parser, min, max)
Concatenate(*parsers)
AnyIndent(indent_parser, skip)
SameIndent(indent_parser, skip)
IndentIncreased(indent_parser, skip)
IndentDecreased(indent_parser, skip)
And with those classes I can do a lot of stuff but its ugly as hell and looks too cryptic with all those nested classes instances.
I already know that I can use some of the __magic__ methods to implement some operator overloads like:
parser_a + parser_b becomes Sequence(parser_a, parser_b) and parser_a | parser_b becomes Either(parser_a, parser_b), parser_a + ... becomes Many(parser_a) so on, but some constructs are not obvious like SepBy(parser_a, sep_parser, min=1), is there even a way to represent this in a simple "sweet" expression.

What do you think would be good ways to get rid of this weird looking nested structure using some __magic__ methods?

    pseudo_indent = Regex(r'\s*\n(?=[ \t]*[^\s])')
    indent = Regex(r'[ \t]*')
    any = AnyIndent(indent, skip=pseudo_indent).discard()
    same = SameIndent(indent, skip=pseudo_indent).discard()
    incr = IndentIncreased(indent, skip=pseudo_indent).discard()
    decr = IndentDecreased(indent, skip=pseudo_indent).discard()

    r = RecursionContainer()

    valid_val = Either('val',
                       r.indent_parser_test)

    r.indent_parser_test = Concatenate('grp',
                                       incr,
                                       valid_val,
                                       Many(Sequence(same,
                                                     valid_val,).capture(0)))

    text = '''\
    grp
        grp
            val
            grp
                val
                val
            val
        val
        val
    '''

    result = Sequence(any, r.indent_parser_test).capture(0).parse(text)
    print(result)
    # prints ['grp', ['grp', 'val', ['grp', 'val', 'val'], 'val'], 'val', 'val']