tags:

views:

362

answers:

3

Are there any existing C++ grammar files for ANTLR?

I'm looking to lex, not parse some C++ source code files.

I've looked on the ANTLR grammar page and it looks like there is one listed created by Sun Microsystems here.

However, it seems to be a generated Parser.

Can anyone point me to a C++ ANTLR lexer or grammar file?

+1  A: 

I can't speak for ANTLR's grammars.

The DMS Software Reengineering Toolkit has a robust C++ front end.

The lexer handles all the cruft for ANSI, GCC3, MS Visual Studio 2008, including large-precision floating point numbers, etc.

It also will parse these dialects reliably, build symbol tables, allow you to carry out program transformations, etc.

Ira Baxter
I checked out your website, seems like you have some cool tools at reasonable prices, but your website could do with some work in both structure and look and feel.
Andre Artus
A: 

Lexing Standard C++ is fairly simple, since the keywords are simple and straightforward. It shouldn't take someone more than an hour or so to write a lexer.

Parsing C++ is a total nightmare, largely due to the syntactic similarities of use and declaration, and the template syntax. Because of these, C++ is actually not a context free language.

Joel
An hour or two? You've obviously never done this. Our production quality C++ front (see DMS answer to this question) has a lexer definition that is 10K+ lines plus. You might think that unreasonable until you sort out all the different types of literals involved, strings with character escapes, line continations, trigraphs, the goofiness of the preprocessor directives, pragmas, MS DLL declarations, etc. Our lexer also converts literal values into native data types (e.g., float literal into the actual double value, string literals into the actual string), but ignoring that its still 5K SLOC.
Ira Baxter
Bah I forgot about the preprocessor. That's actually ugly, although most compilers will generate preprocessed code for you.
Joel
It isn't just the preprocessor. I included the preprocessor *lexing* in my comment above. The preprocessor machinery (macro capture, substitution, conditional evaluation, token gluing, ...) itself is another several thousand lines of (obtuse, thanks C++ standard) code. There's a lot more complexity in the lexemes than you might think based on just the list of keywords (you're right, the keywords are easy).
Ira Baxter
+1  A: 

How about Antlr C++ grammar?

RP
@c14ppy, note that this grammar is not a ANTLR v3 grammar. Never tried generating a v2 grammar with ANTLR v3, but if you get strange error messages, try using ANTLR v2 instead.
Bart Kiers