I'm working an a very large scale projects, where the compilation time is very long. What tools can I use (preferably open source) on Linux, to find the most heavily included files and that optimize their useages? Just to be clearer, I need a tool which will, given the dependencies, show me which headers are the most included. By the way, we do use distributed compiling
Tools like doxygen (used with the graphviz options) can generate dependency graphs for include files... I don't know if they'd provide enough overview for what you're trying to do, but it could be worth trying.
This is not exactly what you are searchng for, and it might not be easy to setup, but may be you could have a look at lxr : lxr.linux.no is a browseable kernel tree.
In the search box, if you enter a filename, it will give you where it is included. But this is still guessing, and it does not track chained dependencies.
Maybe
strace -e trace=open -o outfile make
grep 'some handy regex to match header'
If you wish to know which files are included most of all, use this bash command:
find . -name '.cpp' -exec egrep '^[:space:]#include[[:space:]]+["<][[:alpha:][:digit:]_.]+[">]' {} \;
| sort | uniq -c | sort -k 1rn,1
| head -20
It will display top 20 files ranked by amount of times they were included.
Explanation: The 1st line finds all *.cpp files and extract lines with "#include" directive from it. The 2nd line calculates how many times each file was included and the 3rd line takes 20 mostly included files.
Using the Unix philosophy of "gluing together many small tools" I'd suggest writing a short script that calls gcc with the -M (or -MM) and -MF (OUTFILE) options (As detailed here). That will generate the dependency lists for the make tool, which you can then parse easily (relative to parsing the source files directly) and extract out the required information.
The answers here will give you tools which track #include dependencies. But there's no mention of optimization and such.
Aside: The book "Large Scale C++ Software Design" should help.
From the root level of the source tree and do the following (\t is the tab character):
find . -exec grep '[ \t]*#include[ \t][ \t]*["<][^">][">]' {} ';'
| sed 's/^[ \t]*#include[ \t][ \t]*["<]//'
| sed 's/[">].*$//'
| sort
| uniq -c
| sort -r -k1 -n
Line 1 get all the include lines. Line 2 strips off everything before the actual filename. Line 3 strips off the end of the line, leaving only the filename. Line 4 and 5 counts each unique line. Line 6 sorts by line count in reverse order.
Use ccache. It will hash the inputs to a compilation, and cache the results, which will drastically increase the speed of these sorts of compiles.
If you wanted to detect the multiple includes, so that you could remove them, you could use makedepend as Iulian Șerbănoiu suggests:
makedepend -m *.c -f - > /dev/null
will give a warning for each multiple include.
Bash scripts found in the page aren't good solution. It works only on simple project. In fact, in large project, like discribe in header page, C-preprocessor (#if, #else, ...) are often used. Only good software more complex, like makedepend or scons can give good informations. gcc -E can help, but, on large project, its result analysis is a wasting time.