I have been programming since 1999 for work and fun. I want to learn new things, and lately I've been focused on parsing, as a large part of my job is reading, integrating and analyzing data. I also have a large number of repetitive tasks that I think I could express in very simple domain-specific languages if the overhead was low enough. I have a few questions about the subject.
- Most of my current parsing code don't define a formal grammar. I usually hack something together in my language of choice because that's easy, I know how to do it and I can write that code very fast. It's also easy for other people I work with to maintain. What are the advantages and disadvantages of defining a grammar and generating a real parser (as one would do with ANTLR or YACC) to parse things compared with the hacks that most programmers used to write parsers?
- What are the best parser generation tools for writing grammar-based parsers in C++, Perl and Ruby? I've looked at ANTLR and haven't found much about using ANTLRv3 with a C++ target, but otherwise that looks interesting. What are the other tools that are similar to ANTLR that I should be reading about?
- What are the canonical books and articles that someone interested in learning more about parsing? A course in compilers unfortunately wasn't part of my education, so basic material is very welcome. I've heard great things about the Dragon Book, but what else is out there?