Python Forum
Adding some __magic__ to my parser combinator.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Adding some __magic__ to my parser combinator.
#1
I am making a parser combinator... for the third time. This time I think it can actually do cool stuff like parsing indentation based grammars but the way of using it is too ugly.
I want to add some syntax sugar to it so it dont look as donting, but I am not finding many ways of doing it. I have the following primitives so far, all of which are Parser subclasses and can be combined in arbitrary ways:

String(text)
Regex(text)
Tag(parser)
Sequence(*parsers)
Either(*parsers)
Many(parser, min, max)
SepBy(parser, sep_parser, min, max)
Concatenate(*parsers)
AnyIndent(indent_parser, skip)
SameIndent(indent_parser, skip)
IndentIncreased(indent_parser, skip)
IndentDecreased(indent_parser, skip)
And with those classes I can do a lot of stuff but its ugly as hell and looks too cryptic with all those nested classes instances.
I already know that I can use some of the __magic__ methods to implement some operator overloads like:
parser_a + parser_b becomes Sequence(parser_a, parser_b) and parser_a | parser_b becomes Either(parser_a, parser_b), parser_a + ... becomes Many(parser_a) so on, but some constructs are not obvious like SepBy(parser_a, sep_parser, min=1), is there even a way to represent this in a simple "sweet" expression.

What do you think would be good ways to get rid of this weird looking nested structure using some __magic__ methods?

    pseudo_indent = Regex(r'\s*\n(?=[ \t]*[^\s])')
    indent = Regex(r'[ \t]*')
    any = AnyIndent(indent, skip=pseudo_indent).discard()
    same = SameIndent(indent, skip=pseudo_indent).discard()
    incr = IndentIncreased(indent, skip=pseudo_indent).discard()
    decr = IndentDecreased(indent, skip=pseudo_indent).discard()

    r = RecursionContainer()

    valid_val = Either('val',
                       r.indent_parser_test)

    r.indent_parser_test = Concatenate('grp',
                                       incr,
                                       valid_val,
                                       Many(Sequence(same,
                                                     valid_val,).capture(0)))

    text = '''\
    grp
        grp
            val
            grp
                val
                val
            val
        val
        val
    '''

    result = Sequence(any, r.indent_parser_test).capture(0).parse(text)
    print(result)
    # prints ['grp', ['grp', 'val', ['grp', 'val', 'val'], 'val'], 'val', 'val']
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Adding markers to Folium map only adding last element. tantony 0 2,121 Oct-16-2019, 03:28 PM
Last Post: tantony

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020