tags:

views:

687

answers:

12

I've been given a job of 'translating' one language into another. The source is too flexible (complex) for a simple line by line approach with regex. Where can I go to learn more about lexical analysis and parsers?

+1  A: 

I've recently been working with PLY which is an implementation of lex and yacc in Python. It's quite easy to get started with it and there are some simple examples in the documentation.

Parsing can quickly become a very technical topic and you'll find that you probably won't need to know all the details of the parsing algorithm if you're using a parser builder like PLY.

Greg Hewgill
+6  A: 
John
+1  A: 

If you prefer Java based tools, the Java Compiler Compiler, JavaCC, is a nice parser/scanner. It's config file driven, and will generate java code that you can include in your program. I haven't used it a couple years though, so I'm not sure how the current version is. You can find out more here: https://javacc.dev.java.net/

tsellon
+8  A: 

If you want to get "emotional" about the subject, pick up a copy of "The Dragon Book." It is usually the text in a compiler design course. It will definitely meet your need "learn more about lexical analysis and parsers" as well as a bunch of other fun stuff!

IMH(umble)O, save yourself an arm and/or leg and buy an older edition - it will fill your information desires.

Matt Cummings
+1  A: 

Gnu flex and bison are the new lex and yacc though. The syntax for BNF is often derided for being a bit obtuse. Some have moved to ANTLR and Ragel for this reason.

If you're not doing much translation, you may one to pull a one-off using multiline regexes with Perl or Ruby. Writing a compatible BNF grammar for an existing language is not a task to be taken lightly.

On the other hand, it is entirely possible to leverage any given language's .l and .y files if they are available as open source. Then, you could construct new code from an existing parse tree.

hoyhoy
+1  A: 

Lexing/Parsing + typecheck + code generation is a great CS exercise I would recommend it to anyone wanting a solid basis, so I'm all for the Dragon Book

svrist
+3  A: 

Niklaus Wirth's book "Compiler Construction" (available as a free PDF) http://www.google.com/search?q=wirth+compiler+construction

tamberg
+1  A: 

Yet another textbook to consider is Programming Language Pragmatics. I prefer it over the Dragon book, but YMMV.

If you're using Perl, yet another tool to consider is Parse::RecDescent.

If you just need to do this translation once and don't know anything about compiler technology, I would suggest that you get as far as you can with some fairly simplistic translations and then fix it up by hand. Yes, it is a lot of work. But it is less work than learning a complex subject and coding up the right solution for one job. That said, you should still learn the subject, but don't let not knowing it be a roadblock to finishing your current project.

+1  A: 

I found this site helpful:

Lex and YACC primer/HOWTO

The first time I used lex/yacc was for a relatively simple project. This tutorial was all I really needed. When I approached more complex projects later, the familiarity I had from this tutorial and a simple project allowed me to build something fancier.

Matt
+1  A: 

After taking (quite) a few compilers classes, I've used both The Dragon Book and C&T. I think C&T does a far better job of making compiler construction digestible. Not to take anything away from The Dragon Book, but I think C&T is a far more practical book.

Also, if you like writing in Java, I recommend using JFlex and BYACC/J for your lexing and parsing needs.

jacobko
+1  A: 

Lots of people have recommended books. For many these are much more useful in a structured environment with assignments and due dates and so forth. Even if not, having the material presented in a different way can help greatly.

(a) Have you considered going to a school with a decent CS curriculum?
(b) There are lots of online lectures, such as MIT's Open Courseware. Their EE/CS section has many courses that touch on parsing, though I can't see any on parsing per se. It's typically introduced as one of the first theory courses as language classification and automata is at the heart of much of CS theory.

wnoise
+1  A: 

Parsing Techniques - A Practical Guide By Dick Grune and Ceriel J.H. Jacobs

This book (freely available as PDF) gives an extensive overview of different parsing techniques/algorithms. If you really want to understand the different parsing algorithms, this IMO is a better reference than the Dragon Book (as Parsing Techniques focuses entirely on parsing, while the Dragon Book covers parsing only as one - although important - part of the compiler construction process).

Gio