views:

479

answers:

5

I want to build a parser for a C like language. The interesting aspect about it is that I want to build it in such a way that someone who has access to the source can easily modified it to extend the language (a new expression type of instance) with the extensions being runtime configurable (they can be turned on and off).

My current intent is to build a recursive decent parser as an object. Each production will be a method of an object. The method of extension will be to derive classes from this base replacing methods (and production definitions) as needed. I'm still trying to figure out how to mix and match extensions. One idea is to play games with the v-tbl. Objects would be constructed with a v-tbl that is a copy of the base but with methods replaced from derived classes.

Aside from the bit-twiddling nature of the solution the only issues I have with it is

  • a reasonable way to do the v-tbl mixup
  • what to do when 2 extensions alter the same productions (as most replacements will end up calling the original having one replacement call the other would work but the mechanics of setting this up are the issue)
  • how to allow the extension of extensions (this might end up looking like a standard MI system, but I've never got how they work)

Another solution (a slightly more mundane version of the same same approach) would be to use static member variables to store function-pointers and call them for the same effect.

Edit: I have already built a system that lets me build productions from BNF definitions. I can alter it to support whatever I decide on.

+1  A: 

These are some of the challenges the Perl 6 design effort has faced. You may find it worthwhile looking into some of the solutions they came up with. Or you may find that to be gross overkill.

ysth
A: 

If I recall my university courses correctly, recursive descent parsers have some limitations that might bite you, especially since you're allowing extensions - somebody elses language extension could cause issues.

A proper compiler toolkit - such as the open source ANTLR - might make things easier, and might also provide some different approaches for you.

Bevan
I am (painfully) aware of the limitations of Recursive decent (and LL in general) :(. I have tried ANTLR and never really got it working right. While it might be usable, I'd probably have to alter it's internals to use it.
BCS
A: 

another option is to express the parsing rules in XML or something, instead of in code; less efficient, but far more dynamically configurable; each language or variant can just use its own (XML) file, and even include/reference other files as 'base' files...

Steven A. Lowe
Also far more complex in some ways. I wold have to parse the XML (or whatever) and then build a parser from it that can be immediately run. The advantage would be less 'code' but to make that work I would need to avoid needing to write code for the actions in the grammar (Ouch).
BCS
@[BCS]: you would just need a way of specifying them in the XML (e.g. assembly name, classname, method name, parm bindings) that would allow you to execute them dynamically. I have done this (in C), and it's not simple, but it is very flexible.
Steven A. Lowe
A: 

Frankly, I am not even sure I understood everything you wrote... :-)

But when I see parser and flexibility, I think about LPeg - Parsing Expression Grammars For Lua. It might not fit your needs but it is well worth a look... ;-)

PhiLho
A: 

I made a configurable parser I uploadei it some time ago at http://code.google.com/p/compparser/ The project there is not up-to-date but is working fine.

Raffaele