views:

182

answers:

9

Our next product has grown too large to link on a machine running 32-bit Windows. The sum total of all the lib files exceeds 2Gb and can only be linked on a 64-bit Windows machine. Eventually we will exceed that boundary, since our software tends to grow rather than contract and we are using a 32-bit linker (MS Visual Studio 2005): we expect to hit trouble when our lib size total exceeds 3Gb.

How can I reduce the size of the .lib files, or the .obj files without trimming code? For example, we use a lot of templates: is there any way of reducing their footprint? Is there any way of finding out what's causing the bloat from examining the .lib/.obj files? Can this be automated rather than inspected by eye? 2.5Gb is a lot of text to peer through and compare.

External constraints prevent us from shipping as anything other than a single .exe, so a DLL solution is not available.

+2  A: 

OMFG!!!!! That's huuuuuge!

Apart from the fact I think it's too big to be rational... can't you use dynamic linking to avoid linking all the mess in compile time and only link in runtime what's necesary (I mean, loading dlls in demand)?

helios
Yes, it is big. We suspect template bloat. Particularly, we suspect that inline definitions of class template member funtions are being repeated with the same template parameters many times over. We want to verify or disprove that, or hear any other suggestions for early reduction of .lib size.
hatcat
@hatcat: I don't think inlined template members are the problem. On the contrary, I'd expect the non-inlined template instantiations to be the big offenders. Imagine you have a `vector<int>` and a `vector<unsigned>`: any member function *not* inlined will exist in two copies. And the ones that aren't inlined are the longer functions that contribute noticeably to overall code size.
jalf
Yes, I wasn't very clear there. We suspect the inline failures to be the cause and want to reduce those, perhaps by specialising and defining non-inline.
hatcat
+5  A: 

I had once been working on a project with several MLoC. While ours would still link on a 32bit machine, link times where abysmal and became a major problem, because developers were reduced to only get a dozen edit-compile-test cycles done per workday. (Compile times were handled pretty well by doing distributed compilation.)

We switched to dynamic linking. That increased startup time, but this could be managed by delay-loading of DLLs.

sbi
I'd love to do this, but we must ship as a single .exe image. We do develop as a batch of DLLs, with a special retail configuration which links statically.
hatcat
@hatcat: Have you tried to compile/link without debug info? Yeah, PITA, I know, but something's gonna have to give.
sbi
A: 

If it can't link on 32-bit, what are the chances it can run on 32-bit?

If it were my problem to solve, I'd start looking for large divisions in the program that could be handled by separate processes: break apart the program into modules based on the libraries that different sections of the program need.

If this is really hard to do, perhaps the library APIs could be changed / massaged / split to make it easier to split the program into multiple interacting processes, preferably along clearly understandable functional code divisions. (One process == one task.)

sarnold
When you have >2GB of libraries, that doesn't necessarily mean your application gets >2GB.
sbi
Really, with LTCG on the compiler produces enormous volumes of data - something like 100 megabytes while the produced executable turns out to be like 512K.
sharptooth
+4  A: 

First, of course, make sure you compile with the 'Optimize for Size' option. If you do that, I wouldn't expect inlining, at least, to contribute significantly to the code size. The compiler makes a tradeoff for every inlining candidate regarding how much (if at all) it'd increase code size, compared to the performance boost it'd give. And if you're optimizing for size, the compiler won't risk bloating the code much. (Note that inlining very small functions can actually decrease code size)

Second, have you considered unity builds? That'd pretty much eliminate the linker entirely, and with only one translation unit, there'd be much less duplicate work and hopefully, a smaller memory footprint.

Finally, I know Visual Studio (or possibly the Windows SDK) has a 64-bit compiler (that is, a compiler that is itself a 64-bit application, not just a compiler producing 64-bit code). Consider using that. (I don't know if there is also a 64-bit linker)

I don't know i the linker is built with the LARGEADDRESSAWARE flag set. If so, running it on a 64-bit machine will let the process consume a full 4GB of memory instead of the 2 GB it normally gets. (if necessary, you can add the flag yourself by modifying the PE header)

Perhaps limiting the linkage of various symbols could help as well. If you know that a symbol won't be needed outside of the current translation unit, put it in an anonymous namespace. That might allow the compiler to trim down unused symbols before passing everything on to the linker

jalf
Thanks for your helpful suggestions.Persuading the lead programmer to turn off "LTCG" and "Optimize for Speed" are non-trivial tasks. Speed is the top priority here.We are using unity builds - we have a selection of unity compilation units to facilitate effective use of Xoreax IncrediBuild (go buy it). There are about half a dozen compilation units per library. Additionally, the compiler complains if you put too many source files in a compilation unit ("too many references" ...?).Finally, it's the linker that needs to be 64-bit. Such a thing doesn't exist in the MS tool set.
hatcat
Interestingly, replacing "maximise speed" with "minimise size" in the optimisation options of the biggest offender made no difference to the size at all: it remained at 459,205Kb (yes, half a gig). I'm inclined to believe that the bulk of the binary is information about the code rather than code itself, since it does all ultimately link to about 20Mb.
hatcat
@hatcat: LTCG? I suppose this is bad for lib sizes. "Optimize for Speed", OTOH... I have read somewhere that usually, "Optimize for Size" brings more speed advantages, because of the smaller working set of the application. Since this was accompanied by the info that "we at MS" do this (note: I am _not_ working for MS), it must have been either a blog posting or an article somewhere in MSDN. Try find that, it might persuade your lead programmer.
sbi
@hatcat: One more thing: Are you by chance generating debug info even for release builds?
sbi
Oh, and I second the recommendation of [Xoreax' IncrediBuild](http://www.xoreax.com/). Excellent tool, excellent support, excellent bang-for-buck. And, no, I'm not working for them. They just deserve to be praised.
sbi
@hatcat: persuading the lead programmer to make those changes should be trivial the moment the linker runs out of memory. ;) Slow software is better than no software. ;) And as @sbi said, "optimize for speed" doesn't necessarily reduce performance.
jalf
We are indeed generating debug info for all configurations. And it looks like the changes recommended so far are having no bearing on the problem. I need to get the LTCG switch-off right though. Really, I want to concentrate on discovering which symbols are being repeatedly defined for the time being, hence the original question about cracking the .lib/.obj file format.
hatcat
Added a few more suggestions. I know it doesn't help you figure out the symbol information from .libs, but it might still help.
jalf
You know, the anonymous namespace idea might be a goer. If, as we believe, there's too much template shenanigans going on, we could put those in anonymous namespaces and prefix the names with the current namespace and an underscore if there are any collisions (unlikely). Will try this on Monday. Thanks.
hatcat
@hatcat: no need to do any prefix trickery. You can nest anonymous and named namespaces just fine, so it shouldn't produce any new name collisions. Just add the anonymous one inside the innermost namespace, for example.
jalf
+2  A: 

Does it need to be one big app?

One option is to split various modules into DLLs and load/unload them as needed.

Alternatively, you might be able to split into several apps and share data using mapped memory, pipes a DBMS or even simple data files.

Michael J
It does indeed need to be one big app.
hatcat
Why is it necessary?
Michael J
Well, we're shipping a game with a lot (really lot) of proprietary technology. If we shipped the DLLs as they stand, that technology would be much easier to reverse engineer as the interfaces would be defined. Even if we obfuscated the identifiers, RE folk would have a significant head start. They are a determined and fanatical bunch.
hatcat
@hatcat - A single exe is not *much* harder to RE than a DLL, unless you use COM or something else that is self-documenting. If you want to keep the proprietary stuff in a single exe you can offload some more mundane stuff into DLLs. If you need to pass sensitive details you can encrypt them. You can also check the version (and even the hash) of a DLL before you load it if you want to prevent people substituting their own DLLs (and the DLL can do the same for the exe before it will work).
Michael J
A: 

I do not think there is any single tool that can give you statistics that you want/need. Using either .map files or the dumpbin utility with /SYMBOLS parameter plus some post-processing of the created log might help you get what you want.

If the statistics confirm your suspicion of template bloat, or even without the confirmation, it might be a good idea to do several things with the source:

  1. Try using explicit instantiations and move template definitions into .cpp files. Of course this works only if you have limited and well known set of types/values that you use as arguments to the templates.
  2. Add more abstraction and/or indirection. Factor code that does not depend on your template parameters into their own base classes or free functions. If you have several template type parameters, see if you cannot split the single class template into several base classes without overlapping template parameters. (See http://www2.research.att.com/~bs/SCARY.pdf.)
  3. Try using the pimpl idiom; avoid instantiating templates in headers if you can, instantiate them only in .cpp files.
  4. Templates are nice but sometimes ordinary classes work as well; e.g. avoid passing integer constants as non-type template parameters if you can pass them as parameter to ctor.
wilx
+2  A: 

First of all, find out how to measure the size which is used by various features. Don't go ahead and try to play replace template usage or other things because you suspect that it makes a significant difference.

Run

dumpbin /HEADERS <somebinary>

to find out which sections in your binary are causing the huge size. Do you have a huge Debug Directory section? Strip symbols then. Is the Import Address Table large? Check the table and locate symbols which you don't need (a problem with templates is that symbols of template instantiations tend to be very very large). Similiar analysis can be done for the Exception Directory, COM Descriptor Directory etc..

Frerich Raabe
I would love to use DumpBin but I'm getting unexpected output. Just asking another question...
hatcat
A: 

I am highly skeptical that your actual code is 2GB; more likely you are compiling a lot of information. Consider offloading some of that information into a resource file, and embedding it in the executable as a separate step.

Brian
Indeed, it isn't 2Gb. There are no resources in the code base, only code. It is the .obj/.lib files which are enormous - the final .exe reduces to about 20Mb.
hatcat
+1  A: 

Try using the Symbol Sort program to show you where the main bits of bloat are in your code. Also just looking at the size of the raw .obj files will give you a reasonable idea of where to target.

the_mandrill
That looks VERY handy. I'm not convinced it will solve our particular problem, as I believe it's related to private symbols which are removed at link time; having said that, I'm not a DumpBin expert (getting there though) so I may be missing something. Nonetheless, I shall add that to the armoury. Thanks for the pointer.
hatcat
Given that your problem is that the individual libraries (pre-link optimised) are very large, then this may point you in the direction of the classes which are the most bloated. I would bet that there would be some very large template classes that are instantiated separately for different types. I've come across classes like this where most of the implementation could be moved into a helper function and the templated class could be really slimmed down.
the_mandrill
Props to jalf for a lot of help. SymbolSort (and the accompanying blog post) is very clear on this matter. All I need to do now is work out why DumpBin is telling me my object files have no symbols. Time for another question...
hatcat