views:

1393

answers:

11

I'm working an a very large scale projects, where the compilation time is very long. What tools can I use (preferably open source) on Linux, to find the most heavily included files and that optimize their useages? Just to be clearer, I need a tool which will, given the dependencies, show me which headers are the most included. By the way, we do use distributed compiling

A: 

IIRC gcc could create dependency files.

EricSchaefer
A: 

You might want to look at distributed compiling, see for example distcc

Toni Ruža
+4  A: 

Check out makdepend

Iulian Şerbănoiu
This gives me the dependency for each file. I need someting that given this, will find the most included files.
+2  A: 

Tools like doxygen (used with the graphviz options) can generate dependency graphs for include files... I don't know if they'd provide enough overview for what you're trying to do, but it could be worth trying.

slicedlime
A: 

This is not exactly what you are searchng for, and it might not be easy to setup, but may be you could have a look at lxr : lxr.linux.no is a browseable kernel tree.

In the search box, if you enter a filename, it will give you where it is included. But this is still guessing, and it does not track chained dependencies.

Maybe

strace -e trace=open -o outfile make
grep 'some handy regex to match header'
shodanex
+1  A: 

If you wish to know which files are included most of all, use this bash command:

find . -name '.cpp' -exec egrep '^[:space:]#include[[:space:]]+["<][[:alpha:][:digit:]_.]+[">]' {} \;

| sort | uniq -c | sort -k 1rn,1
| head -20

It will display top 20 files ranked by amount of times they were included.

Explanation: The 1st line finds all *.cpp files and extract lines with "#include" directive from it. The 2nd line calculates how many times each file was included and the 3rd line takes 20 mostly included files.

Haven't checked this out, but your solution won't work if the same file is included using two different paths. I.e. #include <dev/blah.h> and #include <./dev/blah.h> will be considered different include files.
Daemin
Basically a sound idea though.
Jonathan
+2  A: 

Using the Unix philosophy of "gluing together many small tools" I'd suggest writing a short script that calls gcc with the -M (or -MM) and -MF (OUTFILE) options (As detailed here). That will generate the dependency lists for the make tool, which you can then parse easily (relative to parsing the source files directly) and extract out the required information.

Daemin
+3  A: 

The answers here will give you tools which track #include dependencies. But there's no mention of optimization and such.

Aside: The book "Large Scale C++ Software Design" should help.

Vulcan Eager
+2  A: 

From the root level of the source tree and do the following (\t is the tab character):

find . -exec grep '[ \t]*#include[ \t][ \t]*["<][^">][">]' {} ';'
    | sed 's/^[ \t]*#include[ \t][ \t]*["<]//'
    | sed 's/[">].*$//'
    | sort
    | uniq -c
    | sort -r -k1 -n

Line 1 get all the include lines. Line 2 strips off everything before the actual filename. Line 3 strips off the end of the line, leaving only the filename. Line 4 and 5 counts each unique line. Line 6 sorts by line count in reverse order.

paxdiablo
You need [^">]* rather than [^">] in the grep.
Douglas Leeder
This also doesn't track includes that are generated downstream. Parsing the output of "gcc -E -dI" will be a lot better for a more complex project.
Joe Hildebrand
+1  A: 

Use ccache. It will hash the inputs to a compilation, and cache the results, which will drastically increase the speed of these sorts of compiles.

If you wanted to detect the multiple includes, so that you could remove them, you could use makedepend as Iulian Șerbănoiu suggests:

makedepend -m *.c  -f - > /dev/null

will give a warning for each multiple include.

Joe Hildebrand
+1  A: 

Bash scripts found in the page aren't good solution. It works only on simple project. In fact, in large project, like discribe in header page, C-preprocessor (#if, #else, ...) are often used. Only good software more complex, like makedepend or scons can give good informations. gcc -E can help, but, on large project, its result analysis is a wasting time.

Johan Moreau