views:

449

answers:

6
+1  A: 

Lambda the Ultimate discussed a parser that allows syntax extensions.

I'm projecting writing a compiler that would allow syntax extensions (some kind of compile-time metaprogramming). I don't want to have a very powerful system, so I've thought about just having:

{syntax: while (condition) do code}
while (condition, code) => // actual execution

and replace every pattern that matches the syntax with a call to the function. However, I don't know where to start to get the lexer and parser running, because usual tools such as Flex/Bison or ANTLR (I would like to write the compiler in C#) don't seem to allow this.

Could you provide me any direction on where to go next? I've also read that Scheme or Haskell could be better languages to achieve this task. And of course, I'm open to any suggestion about the actual idea to implement them.

Mark Cidade
+1  A: 

Yes, of course !

In all the dynamic languages, this is very simple to achieve, because code can easily be generated and evaluated at runtime. I will recommend two alternatives:

  • In Perl, use Parse::RecDescent. It takes its grammar from a string, and you can definitely ask it to generate a new parser from a new string in runtime.
  • In Python, consider PLY. You can easily generate the functions with docstrings at runtime and run PLY on it.

I personally recommend the Python option, though it may not be relevant if you know Perl but not Python.

For completeness, I must note that you can do it with Lex & Yacc as well, but it's hairy. You'll have to generate a Lex / Yacc file from your grammar at runtime, compile into C, compile that into a shared lib and load it at runtime. This sounds like science fiction, but some tools actually do this for complex needs of efficiency and dynamicity.

Good luck.

Eli Bendersky
+1  A: 

JFlex, the JLex Java extension, lets you do run time compilation, but it is pretty hairy stuff.

Josh
+2  A: 

Take a look at parser combinators which i think may help you. It is possible to make parsers at runtime using this technique. One popular parser combinator is Parsec which uses Haskell as its host language. From the parsec documentation:

Combinator parsers are written and used within the same programming language as the rest of the program. There is no gap between the grammar formalism (Yacc) and the actual programming language used (C)

Parsers are first-class values within the language. They can be put into lists, passed as parameters and returned as values. It is easy extend the available set of parsers with custom made parsers specific for a certain problem

If you are using .NET take a look at the parser combinator library for F#.

Jonas
A: 

What are you going to parse? In C or C++ you won't have a parser in runtime, therefore it's not available without an additional library. For many programming languages this is true though.

All parsers are by default 'dynamic' when you implement them. Even in C.

If the language you are going to parse is your own: writing parsers is a thing to learn on its own. Even with parser generators it's a work in itself. After you've learned it though, it'll become pretty simple though. Special tricks like indented syntax will still be tricky though, and you will require good and extensive tests to see that the parser does what you want. I've written a parser so I know.

Cheery
Python indenting meets html? Not a bad idea.
Ellery Newcomer
A: 

If Java is better for you, there is a port of the Haskell Parsec library - JParsec. Very powerful, though documentation isn't great.

You can coerce it to do a straight forward lex then parse phase, but you can do some interesting things with dynamic lexing and dynamic grammars.

Head twisting stuff.

Because it's all in Java (your Parser is a POJO), you can refactor, and do TDD, and whatever you're used to doing in Java. This is a major advantage to a more traditional ANTLR/JavaCC/JJTree approach.

jamesh