views:

111

answers:

2

i wanna get a xml representation of the ast of java and c code. 3 months ago, i asked this question yet but the solutions weren't comfortable for me

  • srcml seems to be a good solution for this problem but it does not support line numbers and columns but i need that feature.
  • about elsa: cite: "There is ongoing effort to export the Elsa AST as an XML document; we expect to be able to advertise this in the next public release."
  • dms... didn't understand that.
  • especially for java, there is javaml which supports line numbers. but the sourceforge page doesn't list any files.

question: there's software available which supports conversion of ast into xml which supports line numbers (and columns) [especially for java and c/c++]? is there an alternative to javaml and srcml?

ps: i don't wanne have parser generators. i hope to find a tool which can be used on the console typing: ./my-xml-generator Test.java [or something like that]... or a java implementation would be great too.

A: 

There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I haven't actually used it myself.

anon
AFAIK, GCC-XML only dumps type definition data, not the code for the body of functions.
Ira Baxter
A: 

What didn't you understand about DMS?

It exists.

It has compiler accurate parsers for for C and Java (and many other languages).

It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.

It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:

 run DMSDomainParser ++XML  <path_to_your_file>

You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS dump is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.

DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.

Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.

Full disclosure: I'm the architect behind DMS.

Ira Baxter