views:

390

answers:

3

I'm trying to automatically resolve typedefs in arbitrary C++ or C projects.

Because some of the typedefs are defined in system header files (for example uint32), I'm currently trying to achieve this by running the gcc preprocessor on my code files and then scanning the preprocessed files for typedefs. I should then be able to replace the typedefs in the project's code files.

I'm wondering, if there is another, perhaps simpler way, I'm missing. Can you think of one?

The reason, why I want to do this: I'm extracting code metrics from the C/C++ projects with different tools. The metrics are method-based. After extracting the metrics, I have to merge the data, that is produced by the different tools. The problem is, that one of the tools resolves typedefs and others don't. If there are typedefs used for the parameter types of methods, I have metrics mapped to different method-names, which are actually referring to the same method in the source code.

Think of this method in the source code: int test(uint32 par1, int par2)
After running my tools I have metrics, mapped to a method named int test(uint32 par1, int par2) and some of my metrics are mapped to int test(unsigned int par1, int par2).

+2  A: 

GCC-XML can help with resolving the typedefs, you'd have to follow the type-ids of <Typedef> elements until you resolved them to a <FundamentalType>, <Struct> or <Class> element.

For replacing the typedefs in your project you have a more fundamental problem though: you can't simply search and replace as you'd have to respect the scope of names - think of e.g. function-local typedefs, namespace aliases or using directives.

Depending on what you're actually trying to achieve, there has to be a better way.

Update: Actually, in the given context of fixing metrics data, the replacement for the typenames using gcc-xml should work fine if it supports your code-base.

Georg Fritzsche
I'm not sure, but I think GCC-XML actually isn't actively developed any more. The last official version is, according to the website, from 2004. And if I recall correctly, the CVS version is only based on some 3.x version of gcc. I don't really know, if that matters, though.Hmm, namespaces. Hadn't thought of that yet, to be honest. That complicates the matter...
Customizer
I thought you meant replacing the types in the source, that would have gotten more complicated. If you just need to fix the metrics data however and not the source, gcc-xml should be sufficient if it supports your code-base.
Georg Fritzsche
Actually, I think the projects, I'm analyzing, should all be compilable by a gcc v. 3.x. So this could work.
Customizer
According to their website, GCC-XML is able to simulate GCC up to version 4.2. So this should not be a problem. But the fact, that it uses GCC's C++ parser could be problematic with some C programs.
Customizer
+3  A: 

You can use Clang (the LLVM C/C++ compiler front-end) to parse code in a way that preserves information on typedefs and even macros. It has a very nice C++ API for reading the data after the source code is read into the AST (abstract syntax tree). http://clang.llvm.org/

If you are instead looking for a simple program that already does the resolving for you (instead of the Clang programming API), I think you are out of luck, as I have never seen such a thing.

Tronic
I thought clangs C++ parser isn't complete yet?
Georg Fritzsche
The code generation is quite incomplete, but it can already handle most of the standard library, as well as many external libraries. The parser is more complete, but even it still lacks a few things (so you cannot use it e.g. with Boost Spirit.Qi). Still, I think it might be the best available option for what Customizer is asking.
Tronic
The API looks indeed great. I'm quite curious to try it out when its gotten stable.
Georg Fritzsche
+3  A: 

If you do not care about figuring out where they are defined, you can use objdump to dump the C++ symbol table which resolves typedefs.

lorien$ objdump --demangle --syms foo

foo:     file format mach-o-i386

SYMBOL TABLE:
00001a24 g       1e SECT   01 0000 .text dyld_stub_binding_helper
00001a38 g       1e SECT   01 0000 .text _dyld_func_lookup
...
00001c7c g       0f SECT   01 0080 .text foo::foo(char const*)
...

This snippet is from the following structure definition:

typedef char const* c_string;
struct foo {
    typedef c_string ntcstring;
    foo(ntcstring s): buf(s) {}
    std::string buf;
};

This does require that you compile everything and it will only show symbols in the resulting executable so there are a few limitations. The other option is to have the linker dump a symbol map. For GNU tools add -Wl,-map and -Wl,name where name is the name of the file to generate (see note). This approach does not demangle the names, but with a little work you can reverse engineer the compiler's mangling conventions. The output from the previous snippet will include something like:

0x00001CBE  0x0000005E  [  2] __ZN3fooC2EPKc
0x00001D1C  0x0000001A  [  2] __ZN3fooC1EPKc

You can decode these using the C++ ABI specification. Once you get comfortable with how this works, the mangling table included with the ABI becomes priceless. The derivation in this case is:

<mangled-name>           ::= '_Z' <encoding>
<encoding>               ::= <name> <bare-function-type>
  <name>                 ::= <nested-name>
    <nested-name>        ::= 'N' <source-name> <ctor-dtor-name> 'E'
      <source-name>      ::= <number> <identifier>
      <ctor-dtor-name>   ::= 'C2' # base object constructor
    <bare-function-type> ::= <type>+
      <type>             ::= 'P' <type> # pointer to
        <type>           ::= <cv-qualifier> <type>
          <cv-qualifier> ::= 'K' # constant
            <type>       ::= 'c' # character

Note: it looks like GNU changes the arguments to ld so you may want to check your local manual (man ld) to make sure that the map file generation commands are -mapfilename in your version. In recent versions, use -Wl,-M and redirect stdout to a file.

D.Shawley
When I try to run my compiler like this: `g++ foo.cpp -Wl,-map -Wl,mapname` I get the error `/usr/bin/ld: unrecognised emulation mode: apSupported emulations: elf_i386 i386linux`.Am I using the parameters correctly? (g++ version: 4.4.2 20091208 (prerelease), ld version: 2.20.0.20091101)
Customizer
I updated my answer. My local version is still at 4.0.1 and it looks like the arguments have changed. Try `g++ foo.cpp -Wl,-M > foo.map`.
D.Shawley
The program c++filt can be used for demangling the names.
Tronic
@Tronic - thanks. I was wracking my brains trying to remember which program did that. With a goodly usage of 'grep', 'sed', and 'awk' you can extract the symbols in a one-liner. Something like `grep __Z foo.map | grep -v '.eh$' | grep -v '\$' | awk '{ print $NF }' | c++filt --strip-underscore`.
D.Shawley