views:

367

answers:

3

I have an application that loads 170 files (let’s say they are text files) from disk in individual objects and kept in memory all the time. The memory is allocated once when I load those files from disk. So, there is no memory fragmentation involved. I also use FastMM to make sure my applications never leaks memory.

The application compares all these files with each other to find similarities. Over-simplified we can say that we compare text strings but the algorithm is way more complex as I have to allow some differences between strings. Each file is about 300KB. Loaded in memory (the object that holds it) it takes about 0.4MB of RAM. So, the running app takes about 60MB or RAM (working set). It processes the data for about 15 minutes. The thing is that it generates over 40 million page faults.

Why? I have about 2GB of free RAM. From what I know Page Faults are slow. How much they are slowing down my program? How can I optimize the program to reduce these page faults? I guess it has something to do with data locality. Does anybody know some example algorithms for this (Delphi)?

Update:
But looking at the number of page faults (no other application in Task Manager comes close to mine, not even by far) I guess that I could increase the speed of my application IF I manage to optimize memory layout (reduce the page faults).


Delphi 7, Win 7 32 bit, RAM 4GB (3GB visible, 2GB free).

+2  A: 

Caveat - I'm only addressing the page faulting issue.

I cannot be sure but have you considered using Memory Mapped files? In this way windows will use the files themselves as the paging file (rather than the main paging file pagrefile.sys). If the files are read only then the number of page faults should theoretically decrease as the pages won't need to written out to disk via the paging file as windows will just load the data from the file itself as needed.

Now to reduce files from paging in and out you need to try and go through the data in one direction so that as new data is read, older pages can be discarded for ever. Here is where you trade off going over the files again and caching data - the cache has to be stored somewhere.

Note that Memory Mapped files is how windows loads .dlls and .exes amongst other things. I've used them to scan though gigabyte files without hitting memory limits (we had MBs in those days and not GBs of ram).

However from the data you describe I'd suggest the ability to not go back ovver files will reduce the amount of repaging going on.

Preet Sangha
+1  A: 

On my machine most pagefaults are reported for developer studio which is reported to have 4M page faults after 30+ minutes total CPU time. You get 10 times more, in half the time. And memory is scarce on my system. So 40M faults seems like a lot.

It could just maybe be you have a memory leak.

the working set is only the physical memory in use for your application. If you leak memory, and don't touch it, it will get paged out. You will see the virtual memory useage (or page file use) increase. These pages might be swapped back in when the heap memory walks the heap, to get swapped out again by windows.

Because you have a lot of RAM, the swapped out pages will stay in physical memory, as nobody else needs them. (a page recovered from RAM counts as a soft fault, from disk as a hard one)

jdv
@jdv - No it is not leaking. I am very careful about that. Plus I use FastMM to catch the leaks. The memory consumption is pretty much constant (60MB).
Altar
So you have verified the VIRTUAL memory footprint of your app? It can not be 60M as well it is always bigger than the working set. If you are unsure what I mean, run processexplorer and add the virtual size column. download process explorer from http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
jdv
A: 

Do you use an exponential resize system ?

If you grow the block of memory in too small increments while loading, it might constantly request large blocks from the system, copy the data over, and then release the old block (assuming that fastmm (de)allocates very large blocks directly from the OS).

Maybe somehow this causes a loop where the OS releases memory from your app's process, and then adds it again, causing page faults on first write.

Also avoid Tstringlist.load* methods for very large files, IIRC these consume twice the space needed.

Marco van de Voort
Hi Marco. It is not that. Please see my (recently) updated post.
Altar