views:

283

answers:

5

In the static vs shared libraries debates, I've often heard that shared libraries eliminate duplication and reduces overall disk space. But how much disk space do shared libraries really save in modern Linux distros? How much more space would be needed if all programs were compiled using static libraries? Has anyone crunched the numbers for a typical desktop Linux distro such as Ubuntu? Are there any statistics available?

ADDENDUM:

All answers were informative and are appreciated, but they seemed to shoot down my question rather than attempt to answer it. Kaleb was on the right track, but he chose to crunch the numbers for memory space instead of disk space (my question was for disk space).

Because programs only "pay" for the portions of static libraries that they use, it seems practically impossible to quantitatively know what the disk space difference would be for all static vs all shared.

I feel like trashing my question now that I realize it's practically impossible to answer. But I'll leave it here to preserve the informative answers.

So that SO stops nagging me to choose an answer, I'm going to pick the most popular one (even if it sidesteps the question).

+8  A: 

I'm not sure where you heard this, but reduced disk space is mostly a red herring as drive space approaches pennies per gigabyte. The real gain with shared libraries comes with security and bugfix updates for those libraries; applications using static libraries have to be individually rebuilt with the new libraries, whereas all apps using shared libraries can be updated at once by replacing only a few files.

Ignacio Vazquez-Abrams
true, but what about API breaks? Managing multiple versions of a SO(or DLL) is not fun on any system..
Earlz
@earlz: When the API breaks the library's SONAME is incremented to the next second digit so that older executables can continue using the older library. And handling multiple versions of a library is not difficult with modern package managers.
Ignacio Vazquez-Abrams
"I'm not sure where you heard this, but reduced disk space is mostly a red herring as drive space approaches pennies per gigabyte."Amen. vote up.
Anders
+6  A: 

Not only do shared libraries save disk space, they also save memory, and that's a lot more important. The prelinking step is important here... you can't share the memory pages between two instances of the same library unless they are loaded at the same address, and prelinking allows that to happen.

Andrew McGregor
+1 I thought this would be the case, but am glad someone else can confirm!
Peter Recore
They only save memory if two processes, running at the same time, reference the same shared library. Hence the word 'shared'. If you're running multiple copies of the same program I can see this happening, but otherwise I'm very doubtful it happens in the real world.
Jay
@Jay: Almost all apps run using libc.
Ignacio Vazquez-Abrams
Shared libraries don't save RAM. Each process loads its own copy of the library in its address space. See here: http://msdn.microsoft.com/en-us/library/h90dkhs0(VS.80).aspx I believe it holds true for Linux also.
Ioan
@loan Shared objects can save RAM. Having a copy of a library in a process's address space does not mean that there is one physical copy of the library for each process in memory. Rather, the dynamic linker maps the pages for the shared library into the address space, and uses copy-on-write so that if a process changes memory in the shared library, it gets it's own copy. The pages containing code for the shared library probably won't get changed, and therefore won't get copied.
P-Nuts
Exactly. Recent Linux distributions do that, so does OS X. Just because there's a copy showing up in each processes memory map does not mean that those are separate copies of the data; the map could be (and is) set up to point multiple tasks at the same physical pages.
Andrew McGregor
Oops, missed that. Certainly makes sense that read-only code isn't copied, just the writable section.
Ioan
+2  A: 

Ok, perhaps not an answer, but the memory savings is what I'd consider. The savings is going to be based on the number of times a library is loaded after the first application, so lets find out how much savings per library are on the system using a quick script:

#!/bin/sh

lastlib=""
let -i cnt=1
let -i size=0
lsof | grep 'lib.*\.so$' | awk '{print $9}' | sort | while read lib ; do
    if [ "$lastlib" == "$lib" ] ; then
        let -i cnt="$cnt + 1"
    else
        let -i size="`ls -l $lib | awk '{print $5}'`"
        let -i savings="($cnt - 1) * $size"
        echo "$lastlib: $savings"
        let -i cnt=1
    fi
    lastlib="$lib"
done

That will give us savings per lib, as such:

...
/usr/lib64/qt4/plugins/crypto/libqca-ossl.so: 0
/usr/lib64/qt4/plugins/imageformats/libqgif.so: 540640
/usr/lib64/qt4/plugins/imageformats/libqico.so: 791200
...

Then, the total savings:

$ ./checker.sh | awk '{total = total + $2}END{print total}'
263160760

So, roughly speaking on my system I'm saving about 250 Megs of memory. Your mileage will vary.

Kaleb Pederson
Be careful about counting plugins. I would only count libraries with a SONAME.
Ignacio Vazquez-Abrams
BTW, the above script will miss any savings from the last library in the list.
Kaleb Pederson
@Ignacio Why's that?
Kaleb Pederson
@Kaleb: Plugins are typically private to a single application. I suppose it depends on the type of plugin though...
Ignacio Vazquez-Abrams
+2  A: 

I was able to figure out a partial quantitative answer without having to do an obscene amount of work. Here is my (hair-brained) methodology:

1) Use the following command to generate a list of packages with their installed size and list of dependencies:

dpkg-query -Wf '${Package}\t${Installed-Size}\t${Depends}

2) Parse the results and build a map of statistics for each package:

struct PkgStats
{
    PkgStats() : kbSize(0), dependantCount(0) {}
    int kbSize;
    int dependentCount;
};

typedef std::map<std::string, PkgStats> PkgMap;

Where dependentCount is the number of other packages that directly depend on that package.

Results

Here is the Top 20 list of packages with the most dependants on my system:

Package             Installed KB    # Deps  Dup'd MB
libc6               10096           750     7385
python              624             112     68
libatk1.0-0         200             92      18
perl                18852           48      865
gconf2              248             34      8
debconf             988             23      21
libasound2          1428            19      25
defoma              564             18      9
libart-2.0-2        164             14      2
libavahi-client3    160             14      2
libbz2-1.0          128             12      1
openoffice.org-core 124908          11      1220
gcc-4.4-base        168             10      1
libbonobo2-0        916             10      8
cli-common          336             8       2
coreutils           12928           8       88
erlang-base         6708            8       46
libbluetooth3       200             8       1
dictionaries-common 1016            7       6

where Dup'd MB is the number of megabytes that would be duplicated if there was no sharing (= installed_size * (dependants_count - 1), for dependants_count > 1).

It's not surprising to see libc6 on top. :) BTW, I have a typical Ubuntu 9.10 setup with a few programming-related packages installed, as well as some GIS tools.

Some statistics:

  • Total installed packages: 1717
  • Average # of direct dependents: 0.92
  • Total duplicated size with no sharing (ignoring indirect dependencies): 10.25GB
  • Histogram of # of direct dependents (note logarithmic Y scale): Histogram

Note that the above totally ignores indirect dependencies (i.e. everything should be at least be indirectly dependent on libc6). What I really should have done is built a graph of all dependencies and use that as the basis for my statistics. Maybe I'll get around to it sometime and post a lengthy blog article with more details and rigor.

Emile Cormier
+2  A: 

Shared libraries do not necessarily save disk space or memory.

When an application links to a static library, only those parts of the library that the application uses will be pulled into the application binary. The library archive (.a) contains object files (.o), and if they are well factored, the application will use less memory by only linking with the object files it uses. Shared libraries will contain the whole library on disk and in memory whether parts of it are used by applications or not.

For desktop and server systems, this is less likely to result in a win overall, but if you are developing embedded applications, it's worth trying static linking all the applications to see if that gives you an overall saving.

camh