views:

360

answers:

6

I'm looking for a nice stackoverflow-style answer to the first question in this old blog post, which I'll repeat below:

"I’d really like some tool (ideally, g++ based) that shows me what parts of compiled/linked code are generated from what parts of C++ source code. For instance, to see whether a particular template is being instantiated for hundreds of different types (fixable via a template specialization) or whether code is being inlined excessively, or whether particular functions are larger than expected."

+1  A: 

I don't know how to map code->generated assembly in general.

For template instantiations you can use something like "strings -a |grep |sort -u|gc++filt" to get a rough picture of what's being created.

The other two items you mentioned seem pretty subjective actually. What is "too much" inlining? Are you worried your binary file is getting inflated? The only thing to do there is actually go into gdb and disassemble the caller to see what it generated, nothing to check for "excessive" inlining in general.

For function size, again I'm curious why it matters? Are you trying to find code that expands unexpectedly when compiled? How do you even define what an expected size is for a tool to examine? Again, you can always dissemble any function that you suspect is compiling to far more code than you want, and see exactly what the compiler is doing.

Mark B
In regards to "why it matters?" we are developing on a platform with a fixed limit for code size. Some insight would help us find the problem areas to attack first.
Evan Rogers
+3  A: 

It does seem like something like this should exist, but I haven't used anything like it. I can tell you how I'd go about scripting this together, though. There are probably swifter and/or sexier ways to do it.

First some stuff that you may already know:

The addr2line command takes in an address and can tell you where the source code that the machine code there implements. The executable needs to be built with debugging symbols and you'll probably not want to optimize it much ( -O0, -O1, or -Os is probably as high as you'd want to go at first anyway). addr2line has several flags and you'll want to read it's manual page, but you will definitely need to use -C or --demangle if you want to see c++ function names that make sense in the output.

The objdump command can print out all kinds of interesting things about the stuff in many types of object files. One of the things it can do is print out a table representing the symbols in or referred to by an object file (including executables).

Now, what you want to do with that:

What you'll want to is for objdump to tell you the address and size of the .text section. This is where actual executable machine code lives. There's several ways to do this, but the easiest (for this, anyway) is probably for you to do:

objdump -h my_exe | grep text

That should result in something like:

 12  .text       0000049  000000f000  0000000f000 00000400  2**4

If you didn't grep it it would give you a heading like:

Idx  Name        Size     VMA         LMA         File off  Algn

I think for executables the VMA and LMA should be the same, so it won't matter which you use, but I think LMA is the best. You'll also want the size.

With the LMA and size you can repeatedly call addr2line asking for the source code origin of the machine code. I'm not sure how this would work if you passed an address that was within one instruction, but I think it should work.

addr2line -e my_exe <address>

The output from this will be a path/filename, a colon, and a line number. If you were to count the occurance of each unique path/file:num you should be able to look at the ones that have the highest counts. Perl hashes using the path/file:num as the key and a counter as the value would be an easy way to implement this, though there are faster ways if you find that runs too slow. You could also filter out things that you can determine don't need to be included early. For displaying your output you may want to filter out different lines from the same function but you may notice that different lines within one function have different counts, which could be interesting. Anyway, that could be done either by making addr2line tell you the function name or using objdump -t in the first step and work one function at a time.

If you see that some template code or other code lines are showing up in your executables more often than you think they should then you can easily locate them and have a closer look. Macros and inline functions may show end up manifesting themselves differently than you expect.

If you didn't know, objdump and addr2line are from the gnu binutil package, which includes several other useful tools.

nategoose
+1  A: 

If you're looking to find sources of code bloat in your C++ code, I've used 'nm' for that. The following command will list all the symbols in your app with the biggest code and data chunks at the top:

nm --demangle --print-size --size-sort --reverse-sort <executable_or_lib_name> | less
Kevin S
+1  A: 

I don't know if it will help but there is a gcc flag to write the assembly code it generates to a text file for your examination.

"-S Used in place of -c to cause the assembler source file to be generated, using .s as the extension, instead of the object file. This may be useful if you need to examine the generated assembly code. "

Jay
Thanks, that is useful but I was hoping for something more tailored to my problem.
Evan Rogers
A: 

In Visual C++, this is essentially what .PDB files are for.

Crashworks
Can you provide details? How can I determine the code-size associated with a symbol?
Evan Rogers
+1  A: 

In most C compilers there is a way to generate a .map file. This file lists all of the compiled libraries their address and their size. You can use that map file to help you determine which files you should be looking to optimize first.

mjh2007