tags:

views:

499

answers:

6

I have a rather simple C++ project, which uses boost::regex library. The output I'm getting is 3.5Mb in size. As I understand I'm statically linking all boost .CPP files, including all functions/methods. Maybe it's possible somehow to instruct my linker to use only necessary elements from boost, not all of them? Thanks.

$ c++ —version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)

This is what size says:

$ size a.out
__TEXT  __DATA  __OBJC  others  dec hex
1556480 69632   0   4296504912  4298131024  100304650

I tried strip:

$ ls -al
...  3946688 May 21 13:20 a.out
$ strip a.out
$ ls -al
...  3847248 May 21 13:20 a.out

ps. This is how my code is organized (maybe this is the main cause of the problem):

// file MyClass.h
class MyClass {
  void f();
};
#include "MyClassImpl.h"

// file MyClassImpl.h
void MyClass::f() {
  // implementation...
}

// file main.cpp
#include "MyClass.h"
int main(int ac, char** av) {
  MyClass c;
  c.f();
}

What do you think?

+2  A: 

If you are statically linking then most linkers will only include the objects that are needed.

3.5Mb is not that big - on a PC system so size could depend on OS etc

Mark
My source code is just 10KSLOC, boost-regex is another 6KSLOC. How could this code (16KSLOC) produce an executable of 3.5Mb?
Vincenzo
@Vin templates, runtime, preprocessor add up. loc is not very meaningful
aaa
All this commotion about 3.5 MB...
BlueRaja - Danny Pflughoeft
+9  A: 

Did you compile with debugging symbols enabled? That could account for a large portion of the size. Also how are you determining the size of the binary? Assuming you're on a UNIX-like platform are you using a straight "ls -l" or the "size" command. The two could give greatly different results if the binary contains debugging symbols. For example, here are the results I get when building the Boost.Regex "credit_card_example.cpp" example.

$ g++ -g -O3 foo.cpp -lboost_regex-mt

$ ls -l a.out 
-rwxr-xr-x 1 void void 483801 2010-05-20 10:36 a.out

$ size a.out
   text    data     bss     dec     hex filename
  73330     492     336   74158   121ae a.out

Similar results occur when just generating the object file:

$ g++ -c -g -O3 foo.cpp

$ ls -l foo.o 
-rw-r--r-- 1 void void 622476 2010-05-20 10:40 foo.o

$ size foo.o
   text    data     bss     dec     hex filename
  49119       4      40   49163    c00b foo.o

EDIT: Added some static linking results ...

Here's the binary size when statically linking. It's closer to what you're getting:

$ g++ -static -g -O3 foo.cpp -lboost_regex-mt -lpthread

$ ls -l a.out 
-rwxr-xr-x 1 void void 2019905 2010-05-20 11:16 a.out

$ size a.out 
   text    data     bss     dec     hex filename
1204517    5184   41976 1251677  13195d a.out

It's also possible that much of the large size is coming from other libraries the Boost.Regex library depends on. On my Ubuntu box, the dependencies for the Boost.Regex shared library are the following:

$ ldd /usr/lib/libboost_regex-mt.so.1.38.0 
        linux-gate.so.1 =>  (0x0053f000)
        libicudata.so.40 => /usr/lib/libicudata.so.40 (0xb6a38000)
        libicui18n.so.40 => /usr/lib/libicui18n.so.40 (0x009e0000)
        libicuuc.so.40 => /usr/lib/libicuuc.so.40 (0x00672000)
        librt.so.1 => /lib/tls/i686/cmov/librt.so.1 (0x001e2000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x001eb000)
        libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0x00110000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x009be000)
        libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0x00153000)
        libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x002dd000)
        /lib/ld-linux.so.2 (0x00e56000)

The ICU libraries can get quite large. Besides debugging symbols, perhaps they are the primary contributors to the size of your binary. Furthermore, in the statically linked case, it looks like the Boost.Regex library itself is comprised of large object files:

$ size --totals /usr/lib/libboost_regex-mt.a | sort -n
      0       0       0       0       0 regex_debug.o (ex /usr/lib/libboost_regex-mt.a)
      0       0       0       0       0 usinstances.o (ex /usr/lib/libboost_regex-mt.a)
      0       0       0       0       0 w32_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
   text    data     bss     dec     hex filename
    435       0       0     435     1b3 regex_raw_buffer.o (ex /usr/lib/libboost_regex-mt.a)
    480       0       0     480     1e0 static_mutex.o (ex /usr/lib/libboost_regex-mt.a)
   1543       0      36    1579     62b cpp_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
   3171     632       0    3803     edb regex_traits_defaults.o (ex /usr/lib/libboost_regex-mt.a)
   5339       8      13    5360    14f0 c_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
   5650       8      16    5674    162a wc_regex_traits.o (ex /usr/lib/libboost_regex-mt.a)
   9075       4      32    9111    2397 regex.o (ex /usr/lib/libboost_regex-mt.a)
  17052       8       4   17064    42a8 fileiter.o (ex /usr/lib/libboost_regex-mt.a)
  61265       0       0   61265    ef51 wide_posix_api.o (ex /usr/lib/libboost_regex-mt.a)
  61787       0       0   61787    f15b posix_api.o (ex /usr/lib/libboost_regex-mt.a)
  80811       8       0   80819   13bb3 icu.o (ex /usr/lib/libboost_regex-mt.a)
 116489       8     112  116609   1c781 instances.o (ex /usr/lib/libboost_regex-mt.a)
 117874       8     112  117994   1ccea winstances.o (ex /usr/lib/libboost_regex-mt.a)
 131104       0       0  131104   20020 cregex.o (ex /usr/lib/libboost_regex-mt.a)
 612075     684     325  613084   95adc (TOTALS)

You could get up to ~600K coming from Boost.Regex alone if some or all of those object files get linked into your binary.

Void
+1 Try using boost::spirit with `-O2 -g`. 250LoC --> 20M, no joke. The symbols are so long, it crashes valgrind. Template debug symbols don't mess around.
academicRobot
How can I disable debug information in the output?
Vincenzo
Don't compile with the -g or -ggdb flag. Alternativly, run `strip -g` on your executable to strip out the debugging symbols.(Note that debugging info arn't loaded when you run the exe, there's no impact on your RAM in keeping them in the executable)
nos
I don't compile with -g. MacOS strip doesn't have -g option… :(
Vincenzo
seems strip -S would do it on MacOSX
nos
The same size of binary after strip -S :(
Vincenzo
+1 for mentioning the boost.regex dependencies to other boost libraries
Holger Kretzschmar
Also try `strip -d` or `strip --strip-debug`. What version of strip do you have (`strip -V`)?
academicRobot
+2  A: 

If you have your link order set correctly (most dependent followed by least dependent) the linker should only grab symbols that your program actually uses. Additionally, a lot (but not all, and I can't speak for regex) boost functionality is header-only due to template use.

More likely is that debugging information/symbol table/etc is taking up space in your binary. Template names (for example iostream and standard containers) are very long and create large entries in the symbol table.

You don't say what OS you're using but if it's a unix variant as a test you can actually strip a copy of your binary to remove all the extra info and see what's left:

cp a.out a.out.test
strip a.out.test
ls -l a.out*

On one binary I tested it removed about 90% of the file size. Note that if you do this any cores will be pretty useless without a copy of the unstripped binary to debug against - you won't have any symbol names or anything, just assembly and addresses. 3.5 MB is really a tiny file in modern times. Most likely there just is that much debugging/symbol information even from only 10Ksloc of source.

Mark B
No need to strip the binary to determine the size without debugging symbols and related cruft. Just use the `size` command, as described in my answer.
Void
+5  A: 

The -O3 flag will not optimize your code for size, but rather for execution speed. So maybe e.g. some loop-unroling will cause a bigger file. Try to compile with some other optimization flag. The -Os flag will optimize for a small executable.

Lucas
That's a good a point, too. Compiling the Boost.Regex "`credit_card_example.cpp`" example with `-Os` dropped the size of the binary by about 20K in both the dynamically and statically linked cases (see my answer for `-O3` results). Regardless, I'd be surprised if it dropped @Vincenzo's binary size significantly. Certainly it's worth a try.
Void
+1, I didn't know that -Os option made the size smaller, definitely good info to know for the future!
shuttle87
A: 

if you have ldd available, you can use it to check if you are really linking with all boost libraries.

Another possibility is that the size is the side effect of use of headers only libraries, many boost libraries are of the kind and including them can inline more code that you could believe. You can also generate some kind of combinatory explosion due to use of several different template parameters.

To get a better diagnostic you should try to create a really short program using regex and see the size you get. If your program is really short 3.5 Mo is quite large. My current projet executable also use BOOST (but not regex) and is about the same size . But I'm speaking of around 20000 lines of C++. Hence there should be a catch somewhere.

kriss
A: 

You say you have 3 files. For me, MyClassImpl.h is probably a .cpp since it contains implementation.

Anyway, if you are actually compiling two files including boost::regex, you will end up having two times the size of boost::regex (precisely if you are using the same functionality in both files you will have twice the cost in space).

This due to the fact that most boost functionality are inlined templates.

best,

Ugo