views:

51

answers:

2

I had to modify some open source code to use in a C project. Instead of building a library from the modified code, I'd like to just compile and build an executable from my own source combined with the modified open source code. The goal is to have a stand-alone package that can be distributed. I can get this to work just fine using the GNU build tools and have successfully built my executable.

Now I'd like to pare down the amount of code I am building and linking. Is there an easy way to determine which of the open source files I actually need to compile? There are, say, 40 .c files in the open source package. I'm guessing my code only uses (or causes to be used) 20-ish of those files. Currently I'm compiling all of them and throwing everything at the linker. There has to be a smart (and easy?) way to determine which ones I actually need, right?

I'm happy to provide further details if it's helpful. Thanks in advance.

+2  A: 

See if you can get your dead-code stripper to tell you what functions/symbols it eliminated during the link. Then you will know what source code you can safely remove. The GNU linker's -map option may be useful on that front. You could, for instance, link once without dead-code stripping, then link again with dead-code stripping and compare the output map files.

If there are only 40 source files maximum, is this optimization really worth your time?

Carl Norum
I think it is worthwhile to avoid extraneous files in a distribution. Obviously not a deal-breaker, but would be nice to have a cleaner package.
Joey
+2  A: 

When faced with this I've either simply taken the final link command stripped out all of the objects and then added back in until it works, or processed the output of the nm command.

Worked example:

Looking at the output of nm:

$ nm *.o

a.o:
00000000 T a
         U aa

b.o:
00000000 T b

t.o:
         U a
         U b
00000000 T main

ua.o:
00000000 T ua

ub.o:
00000000 T ub

So I create the following awk script

# find-unused.awk
BEGIN {req["main"]="crt"}

/\.o\:$/{
    gsub(/\:/,"");
    modulename=$0;
}

$1=="U"{
    req[$2] = modulename;
}

/[0-9,a-f].* T/{
    def[$3] = modulename;
}

END{
    print "modules referenced:"
    for (i in req)
    {
        if (def[i] != "")
            print "    "def[i];
    }

    print "functions not found"
    for (i in req)
    {
        if (def[i] == "")
            print "    "i;
    }
}

and then call it like this;

$ nm *.o|awk -f find-unused.awk

it tells me:

modules referenced:
    t.o
    a.o
    b.o
functions not found
    aa

Which is right - because the ua & ub functions in the above example aren't used.

Richard Harrison
This script was very handy. Thanks for the detailed answer. The only hitch in the script is that it lists the .o files for each function that is within that .o. I just used 'sort -u' to get the unique .o's and had my answer. Turns out that I'm using 105 of the 115 .o's I have. So I won't be removing any files. Thanks again.
Joey
It was only a quick script - could probably be improved - but I'm glad it was useful.
Richard Harrison