views:

464

answers:

6

I'd like to understand how to construct a parser in .NET to process source files. For example, maybe I could begin by learning how to parse SQL or HTML or CSS and then act on the results to be able to format them for readability or something similar.

Where can I learn how to do this? Are there specific books I can refer to? Do I need to learn about lexers/parsers?

Specifically for the .NET platform since I'm comfortable in C#.

+2  A: 

ANTLR :)

its a good way to learn about grammers and parsers

Keith Nicholas
The "downside" (in this case only) of ANTLR is that it will produce the parser for you, in one of several target languages. Cool, that's usually the goal, but in this situation, SevenCentral will miss out on the educational of writing one from scratch [for a simple grammar].
mjv
A handy tool for developing complex parsers (it can even emit C# code), but perhaps not the best approach if you want to learn the background and basics.
Noldorin
I think its a good way to start, you kind of get a working knowledge of whats possible in terms of parser and then can dig deeper and build your own if you still want to, perhaps a simple recursive decent parser.
Keith Nicholas
+13  A: 

I personally found this article, Grammars and Parsing with C# 2.0, a great introduction on writing lexers/parsers, with examples specifically relating to C#.

I wrote a brief blog post about it not long ago, doing it praise. The nice thing is that it's very much aimed at complete beginners to parse theory (it gives background to the theory as well as implementation), and takes matters in gradual steps. Of course, if you want to proceed to learn the more advanced ideas of the field, you will need various other resources, but I think this is an excellent foundation.

Noldorin
+1. That one looks nice. And definitely a better start than diving right into some very complicated code.
Joey
This was great. A nice introduction, good examples and followed through with some C# to tie it all together!
SevenCentral
+2  A: 

If you do want to learn how to write the parser this might not be your answer, but if you just want to parse and work with the parse results, you should definitively look at Irony.net. It's a toolkit which helps to implement languages (with .NET).

andyp
A: 

Hi, even tough this may look a bit too much advanced, take a look at monadic parser combinator. There's a great blog post on LukeH's WebLog here:

http://blogs.msdn.com/lukeh/archive/2007/08/19/monadic-parser-combinators-using-c-3-0.aspx

Once you get the basics, it make for very clear parser definitions.

Nicolas Buduroi
A: 

The parser for my programming language, Heron, is in C# and was intended to be easy to reuse and understand. It is a recursive descent PEG parsing library. You can download the source code here. To see how to express grammars for grammars, you can look at this file. I have also described how the parsing library works in a recent blog post here.

cdiggins
A: 

The best book that I've read for learning the idioms of parsing is "Little Languages"

Little Languages on Amazon

If you can get your hands on the .NET source code for System.Text.RegularExpressions, you will also see a real world implementation of how to build a parser.

Justin Rogers has some excellent articles on how to build generic parsers on his blog:

Justin's Blog

And finally, if you want to enter the new world of parsers and grammars, you should really be reading up on 'Oslo' and how to use language M and MGrammar. They will give you a lot of flexibility when it comes to parsing and transforming the resulting object graph into other usable forms.

Justin's articles are probably the easiest and simplest to get up and running with a raw parser that is built atop .NET.

Qwerty