views:

738

answers:

7

What is the best way to build a parser in c# to parse my own language? Ideally I'd like to provide a grammar, and get Abstract Syntax Trees as an output. Many thanks, Nestor

A: 

You could study the source code for the Mono C# compiler.

Robert Harvey
Thanks. I'm not trying to write a parser for c#, but a parser for my own language, the parser being written in c#. Thanks much for your suggestion though.
Nestor
+11  A: 

I've had good experience with ANTLR v3. By far the biggest benefit is that it lets you write LL(*) parsers with infinite lookahead - these can be quite suboptimal, but the grammar can be written in the most straightforward and natural way with no need to refactor to work around parser limitations, and parser performance is often not a big deal (I hope you aren't writing a C++ compiler), especially in learning projects.

It also provides pretty good means of constructing meaningful ASTs without need to write any code - for every grammar production, you indicate the "crucial" token or sub-production, and that becomes a tree node. Or you can write a tree production.

Have a look at the following ANTLR grammars (listed here in order of increasing complexity) to get a gist of how it looks and feels

Pavel Minaev
Thanks! Good suggestion
Nestor
I've edited it to add some links to examples - take a look at JSON grammar in particular, it shows how to customize the output AST with no code.
Pavel Minaev
Thanks again Pavel. Very useful pointers.
Nestor
+1  A: 

While it is still in early beta the Oslo Modeling language and MGrammar tools from Microsoft are showing some promise.

Mike Two
Yes. I've seen that.. I like the editor (adjusting the syntax highlighting to my grammar). But it's a bit early to use it .. it looks.
Nestor
@Nestor - I agree that it is too early to use it, but I thought it deserved a mention.
Mike Two
+1 for your comment. It's true. I would use Oslo if it were in its final release.
Nestor
+7  A: 

I've played wtih Irony. It looks simple and useful.

Ball
A nice aspect (at least I thought) of Irony is that you can write your grammar in a language like c#
saret
A: 

I would also take a look at SableCC. Its very easy to create the EBNF grammer. Here is a simple C# calculator example.

SwDevMan81
A: 

Lex and yacc are still my favorites. Obscure if you're just starting out, but extremely simple, fast, and easy once you've got the lingo down.

You can make it do whatever you want; generate C# code, build other grammars, emulate instructions, whatever.

It's not pretty, it's a text based format and LL1, so your syntax has to accomodate that.

On the plus side, it's everywhere. There are great O'reilly books about it, lots of sample code, lots of premade grammars, and lots of native language libraries.

davenpcj
A: 

There's a short paper here on constructing an LL(1) parser here, of course you could use a generator too.

Longpoke