views:

94

answers:

4

I'm a complete beginner, and this is how I understand linking: Static linking copies only the code that is actually used into the executable. Dynamic linking uses .dlls that may contain a lot of code that is never used by the application. Please correct me if I'm wrong :)

Now here's my question. I'm using an open source library in my application, but I will implement only a part of its functionality. To get the final size of my executable + libraries as small as possible, should I use static or dynamic linking? How can I ensure that no unnecessary code is copied?

Thanks!

+2  A: 

If you use dynamic linking, you'll have to include the entire DLL with your distributed application, however large it may be. If you use static linking, the linker should* only link the functions you use in your code, into the executable, as you said above.

AFAIK, as long as you aren't using every function exported by the DLL, linking with the static library will always result in a smaller executable, as compared to the size of (executable + DLL). In other words, if you compare just the executables, the dynamic-linked one will be smaller, but if you compare the whole package, the static-linked will be smaller.

*I'm not aware of any linkers that do link every function in a library -- the big linkers (MS, GNU) will certainly link the minimal amount needed, but that's not to say there aren't some crappy linkers out there that will link everything in regardless of whether it's used.

Mark Rushakoff
A: 

Static linking copies the code you reference, transitively, into your executable.

Dynamic linkings leaves your program in more-or-less its original state, and hooks up to system libraries at run time.

To minimize the final size of the executable, use dynamic linking.

DigitalRoss
You're right using DLLs will result in a smaller executable, but he wants to minimize the size of the executable *plus libraries*. You're being confusing.
Joren
Yes but that size (executable+libs) is the same in both cases, if you only count the code your program touches. If you count all the library code whether you use it or not, well, sure, static will be smaller. But that other code is already there on the system, right? And its being used by other programs, right? So it's full accounting for size shouldn't be charged to any one program...If you think no one else on the system will be using that library then you might want to do a combination of static and dynamic linking.
DigitalRoss
I should note that some people statically link programs they are distributing just to avoid dll hell. It wastes some memory because now the user has two printf(3)'s in ram, etc., but the app may be more likely to work.
DigitalRoss
"But that other code is already there on the system, right?" Not necessarily. But in the cases where it is, I agree with your comments.
Joren
How about: link statically with your lib, then dynamically with the system libs. Best of both worlds.
DigitalRoss
Another reason to use static linking - the executable is installed with either the setuid or setgid bit set, and may be installed in user-chosen locations (not in the system directories). This prevents the use of LD_LIBRARY_PATH or equivalents, which means that shared libraries can't be found unless the user can fix /etc/ld.so.conf or install the shared objects in the system library directories, etc.
Jonathan Leffler
If there are 10 programs in the suite, and each statically links 50% of the library, you have 5 copies worth of library stored on disk, and 5 copies of the library in main memory if all ten programs are running. By contrast, if the 10 programs all use the shared library, then you have one copy of the text of each program, and one copy of the text of the (whole) library in memory - for a memory saving at runtime and a space saving on disk. If you have just one executable, then maybe static linking makes sense.
Jonathan Leffler
+4  A: 

When using dynamic linking you are in effect building [at least] two binaries: the program per-se (.exe) and a dll. For the exe, the compiler/linker can detect unused portions of the code and only produce the minimal necessary in the output. With the DLL, however, all the functions and variables marked as exported (and all the code necessary to make these work) will have to be included in the ouput. That is because the compiler/linker have no way of knowing which of these functions may be used by programs (some of them to be written in the future...).

However, since it appears you'll be writing both the exe and the dll(s), you can choose what will be exported, and hence only include the minimal necessary.

EDIT: in readproofing I noted that actually you are considering using an open source library, so the above statement requires some qualification.

If you build the open source code as-is (assuming the sources include a build for DLL), it will likely include all publicly declared functions of the library. You may however alter the list of methods declared for export and hence getting the minimal possible amount of binary.

Using dynamic linking can lead to saving in the overall amount of binary needed, because several programs can make use of the same dll. For example, a graph plotting application and a video game program can share the same graphic utilities dll.

In general, the choice about using dynamic linking or not, is not such a critical one. This issue was more a problem in the past, with slower CPU (hence longer build times) and other limits regarding memory, hard disk and also distribution bandwidth (with floppies! and such...).

A modern rule of thumb, in these days and age of Gigabyte-sized storage, is to pick static linkage, by default, unless one of the following applies:

  • the DLL is a 3rd party, publicly available DLL (i.e. one that the end users may update independantly of your own update cycle)
  • Several portions of the application are typically not used, and the underlying logic can be "stowed-away" in a DLL, making for an overall smaller run-time footprint (when the users are not using the underlying specialized/advanced features.
  • the program is part of a suite of programs, several of then having enough commonality that can be shared.
  • it is desirable to have multiple versions of the application. For example you may implement a basic/free/limited version of the application, and a full featured one. Assuming you manage for the program to call either version of the features with the same API, the distinct behaviors can be encapsulated in the DLLs alone, allowing the paying users to merely download the "premium dll" and merely replace the other one (no installation needed).
  • the software is beta, and one expects to send multiple revisions to the end user(s). (as with above, dll swap rather than re-install is nice).
  • different parts of the application are written in different languages. In this case is is often possible to use static linking (by forcing the compilers to agree on calling conventions and such), but the DLL approach may also ease this cooperation.
  • the software is produced by different programmers / teams. The DLL provides an implicit delineation of the responsibilities.

There may be a few more cases, but again, unless there are some existing needs for DLL, static linking is just as good.

mjv
If you have just one program that you're distributing, then I can agree with static linking. If you are distributing a suite of programs which all could use a shared library, then you are very quickly wasting runtime memory as well as disk space. And even though machines do have lots of memory, you can almost invariably benefit from more efficient use of that memory - and shared libraries tend to promote more efficient use of that memory.
Jonathan Leffler
agreed, cf the "unless" dot #3 in the response.
mjv
+1  A: 

To complicate matters even further there are more options.

First you can use a mixture of static and dynamic linking. It's normal to dynamically link against your target system's libc and libm (basic C and math libraries). For most target platforms you can guarantee that those will be present on any functional system. If they aren't there then you're application won't run ... but practically nothing else would work either. There won't be any shell or script engine to attempt to launch your program.

From there it depends. For example in UNIX/Linux programming using "curses" one would normally link dynamically to the curses or ncurses libraries. However, there have been some versions of curses which had optional libraries offering higher level abstractions (such as "pads"). One might be best statically linking just those extra bits into your executable, so that you're imposing one less dependency on your users.

Another form of dynamic linking is run-time dynamic linking (through the dlopen() and related functions). In this case you selectively open and link to various libraries depending on your own configuration options, command processing, and so on. You see this in things like the Apache web server (it only tries to load the *mod_rewrite* module if the configuration file includes references to it) and in Perl and Python (loading .so ... shared objects ... with DynaLoader bootstrap and native import commands respectively).

(Obviously the dependencies get interesting when you have Apache loading something like *mod_perl* or *mod_python* which, in turn, is running code that invokes other dlopen() operations ... some which may be loading XML libraries and so on).

Usually you'll just dynamically link your applications and document the resulting dependencies (and perhaps provide installation scripts or packaging as appropriate to your supported targets). Usually that will be sufficient for most projects. Special cases are when you're compiling things like a kernel, or the target is an embedded systems, etc.

If you want to read more about creating your own custom linker scripts then the best starting point would probably be: Using ld: The GNU Linker.

Jim Dennis