views:

720

answers:

8

Hello,

I'm sure many have noticed that when you have a large application (i.e. something requiring a few MBs of DLLs) it loads much faster the second time than the first time. The same happens if you read a large file in your application. It's read much faster after the first time.

What affects this? I suppose this is the hard-drive cache, or is the OS adding some memory-caching of its own.

What techniques do you use to speed-up the loading times of large applications and files?

Thanks in advance

Note: the question refers to Windows

Added: What affects the cache size of the OS? In some apps, files are slow-loading again after a minute or so, so the cache fills in a minute?

+1  A: 

Yep, anything read in from the hard drive is cached so it will load faster the second time. The basic assumption is that it's rare to use a large chunk of data from the HD only once and then discard it (this is usually a good assumption in practice). Typically I think it's the operating system (kernel) that implements the cache, taking up a chunk of RAM to do so, although I'm not sure if modern hard drives have some builtin cache capability. (I once wrote a small kernel as an academic project; caching of HD data in memory was one of its features)

David Zaslavsky
+9  A: 

Two things can affect this. The first is hard-disk caching (done by the disk which has little impact and by the OS which tends to have more impact). The second is that Windows (and other OS') have little reason to unload DLLs when they're finished with them unless the memory is needed for something else. This is because DLLs can easily be shared between processes.

So DLLs have a habit of hanging around even after the applications that were using them disappear. If another application decides the DLL is needed, it's already in memory and just has to be mapped into the processes address space.

I've seen some application pre-load their required DLLs (usually called QuickStart, I think both MS Office and Adobe Reader do this) so that the perceived load times are better.

paxdiablo
+1  A: 

One additional factor which affects program startup time is Superfetch, a technology introduced with (I believe) Windows XP. Essentially it monitors disk access during program startup, recognizes file access patterns and them attempts to "bunch up" the required data for quicker access (e.g. by rearranging the data sequentially on disk according to its loading order).

As the others mentioned, generally speaking any read operation is likely to be cached by the Windows disk cache, and reused unless the memory is needed for other operations.

Tomer Gabel
XP's rearrange-files is called Prefetch. Vista's load-things-into-memory-before-you-start-them is called SuperFetch.
Andrew Coleson
+4  A: 

I see two possibilities :

  • preload yourlibraries at system startup as already mentionned Office, OpenOffice and others are doing just that.

I am not a great fan of that solution : It makes your boot time longer and eats lots of memory.

  • load your DLL dynamically (see LoadLibrary) only when needed. Unfortunately not possible with all DLL.

For example, why load at startup a DLL to export file in XYZ format when you are not sure it will ever be needed ?? Load it when the user did select this export format.

I have a dream where Adobe Acrobat use this approach, instead of bogging me with loads of plugins I never use every time I want to display a PDF file !

Depending on your needs you might have to use both techniques : preload some big heavliy used librairies and load on demand only specific plugins...

siukurnin
A: 

The system cache is used for anything that comes off disk. That includes file metadata, so if you are using applications that open a large number of files (say, directory scanners), then you can easily flush the cache if you also have applications running that eat up a lot of memory.

For the stuff I use, I prefer to use a small number of large files (>64 MB to 1 GB) and asynchronous un-bufferred I/O. And a good ol' defrag every once in a while.

MSN
+2  A: 

One item that might be worth looking at is "rebasing". Each DLL has a preset "base" address that it prefers to be loaded into memory at. If an application is loading the DLL at a different address (because the preferred one is not available) the DLL is loaded at the new address and "rebased". Roughly speaking this means that parts of the dll are updated on the fly. This only applies to native images as opposed to .NET vm .dll's.

This really old MSDN article covers rebase'ng: http://msdn.microsoft.com/en-us/library/ms810432.aspx

Not sure whether much of it still applies (it's a very old article)... but here's an enticing quote:

Prefer one large DLL over several small ones; make sure that the operating system does not need to search for the DLLs very long; and avoid many fixups if there is a chance that the DLL may be rebased by the operating system (or, alternatively, try to select your base addresses such that rebasing is unlikely).

Btw if you're dealing with .NET then "ngen'ng" your app/dlls should help speed things up (ngen = natve image generation).

+5  A: 

Windows's memory manager is actually pretty slick -- it services memory requests AND acts as the disk cache. With enough free memory on the system, lots of files that have been recently accessed will reside in memory. Until the physical memory is needed, those DLLs will remain in cache -- all ala the CacheManager.

As far as how to help, look into Delay Loading your DLLs. The advantages of LoadLibrary only when you need it, but automatic so you don't have LoadLibrary/GetProcAddress on all of your code. (Well automatic, as far as just needing to add a linker command switch):

http://msdn.microsoft.com/en-us/library/yx9zd12s.aspx

Or you could pre-load like Office and others do (as mentioned above), but I personally hate that -- slows down the computer at initial boot up.

DougN
+1  A: 

NGENing the assemblies might help with the startup time, however, runtime might be effected (Sometimes the NGened code is not as optimal as OnDemand Compiled code)

NGENing can be done in the background as well: http://blogs.msdn.com/davidnotario/archive/2005/04/27/412838.aspx

Here's another good article NGen and Performance http://msdn.microsoft.com/en-us/magazine/cc163808.aspx

Timur Fanshteyn