views:

432

answers:

3

I was wondering if anyone knows of existing C++ parsers/code models that can be used programmatically in Java. I'm looking for something similar to the Eclipse CDT that can be used as a library from Java (and that does not rely upon Eclipse). Thanks in advance.

A: 

There are some incomplete LALR grammars for parser generators like Lex, Yacc, Antlr, Jack, etc.

C++ has an undecidable syntax grammar, so LALR and BNR grammars will always be incomplete, but as long as you're not trying to write a C++ compiler, they should be good enough.

greyfade
C++ is not undecidable. It is not LR or LALR, which means it is hard to parse using those parser technologies. That's only a small pain compared to doing symbol table construction for C++, which is a royal bitch (600 pages of C++ standard ...). Frankly, it is just silly to try to roll your own C++ parser unless that's how you want to make your living.
Ira Baxter
My mistake. It being "undecidable" is a comment I frequently see, rarely disputed. I guess I misunderstood the meaning of "undecidable."
greyfade
C++ is really undecidable because parse trees sometimes depend on semantic variables. "Undecidability" here means that the Halting problem can be reduced to the parsing of C++.See http://yosefk.com/c++fqa/web-vs-c++.html#misfeature-3
dmitry_vk
A: 

There are some C++ grammars out there for JavaCC. Try google.

Mr.Ree
I don't think you'll get a robust JavaCC grammar for C++. If you do, you still have to worry about preprocessor handling.Both of the are small pain compared to doing symbol table construction for C++, which is a royal bitch (600 pages of C++ standard ...).
Ira Baxter
You can either skip pre-processing or run it independently as a first pass via something like "g++ -E". --- Books like the ARM (Annotated C++ Reference Manual), (a little outdated -- C++ has been improved since then), include the C++ grammar. I thought the goal here was parsing, not compiling. Symbol table is therefore unnecessary. I have written C++ JavaCC parsers in ages past. Open source solutions do exist.
Mr.Ree
The OP wasn't clear what he wanted to do. In any case, there's not a lot you can do to C++ without a symbol table, so unless he's looking for very limited information, he needs one.
Ira Baxter
+2  A: 

You don't want to build your own C++ parser. It'll kill you.

You already know about the Eclipse CDT project: www.ibm.com/developerworks/library/os-ecl-cdt3/index.html AFAIK, that parser is, well, a bit fuzzy around the edges. YMMV. Advantage: in Java (and in Eclipse if you care). If you want to process C++, and do it in Java, this might be your only practical choice.

There is also our DMS Software Reengineering Toolkit C++ front end: http://www.semdesigns.com/Products/FrontEnds/CppFrontEnd.html Works with a wide variety of C++ dialects (ANSI, GNU, MSVC 2005/2008), tested by fire on millions of lines of code. Disadvantage from your point of view: Not in Java. But if you really want to analyze C++, making a rule that you are only willing to do it in Java might not serve you the best.

Ira Baxter