tags:

views:

169

answers:

6

I have a fairly simple C# application that has builds a large hashtable. The keys of this hashtable are strings, and the values are ints.

The program runs fine until around 10.3 million items are added to the hashtable, when an out of memory error is thrown on the line that adds an item to the hasbtable.

According to the task manager, my program is only using 797mb of memory, and there's still over 2gb available. It's a 32-bit machine, so I know only a total of 2gb can be used by one process, but that still leaves about 1.2gb that the hashtable should be able to expand into.

Why would an out of memory error be thrown?

+3  A: 

Probably it is due to memory fragmentation: you have still free memory but not contiguous. Memory is divided in pages, usually 4KB in size, so if you allocate 4 MB, you will need 1024 contiguous memory pages in your process addressing space (they have not be physically contiguous as the memory is virtualized per-process).

However memory for the hashtable hasn't do be contiguous (unless it is very badly implemented), so maybe it is some limit of the memory manager...

Lorenzo
+6  A: 

In theory you get 2GB for the process, but the reality is that it's 2GB of contiguous memory, so if your process' memory is fragmented you get less than that.

Additionally I suspect the hash table like most data structures by default will double in size when it needs to grow thus causing a huge growth when the tipping point item is added.

If you know the size that it needs to be ahead of time (or have a reasonable over-estimate) it may help to specify the capacity in the constructor.

Alternatively if it's not crucial that it's in memory some sort of database solution may be better and give you more flexibility if it does reach the point that it can't fit in memory.

Davy8
Not exactly: every process has got his private 2GB of contiguous virtual memory. The fragmentation is *inside* the process, because two contiguous memory pages in your process may actually not be contiguous to the kernel.
Lorenzo
Thanks...I've set it running again with a 17 million capacity, and we'll see how it gets on.The data is coming from the database, but I can't run the procedure itself there as recursive queries are very slow. I could insert every instance of (item, node) into the database for it to process after, but I expect it'd be very slow due to network speeds
Paul
@Lorenzo Thanks, updated to reflect that it's memory fragmentation within the process and not the system.
Davy8
@Paul if that doesn't work and the network speed is the issue with using the current database, maybe a lightweight local temporary database with something like SQLite may be appropriate?
Davy8
A: 

You should use something like hibrid between array and linked list. Because the linked list takes more memory per item than array, but the array need continuous memory space.

TcKs
+2  A: 

Use Process Explorer (www.sysinternals.com) and look at the Virtual Address Space of your process. In contrast with "Private Bytes" (which is the amount of memory taken by the process), the Virtual Address Space shows the highest memory address in use. If fragmentation is high, it will be much higher than the "Private Bytes".

If your application really needs that much memory:

  • Consider going to 64-bit
  • Enable the /LARGEADDRESSAWARE flag, which will give your 32-bit process 4GB of RAM under a 64-bit operating system, and 3GB if a 32-bit Windows is booted with the /3GB flag.
Patrick
+1  A: 

You're simply looking at the wrong column. Have a look at the "Commit Size" column, this one should be around 2GB.

http://windows.microsoft.com/en-us/windows-vista/What-do-the-Task-Manager-memory-columns-mean

Lucero
This is Windows XP...I don't think it has "Commit Size" in Task Manager
Paul
This is a nice answer, however now the question becomes: why commit size is so bigger than actual memory usage?
Lorenzo
Lorenzo: "Commit size" is the amount of memory reserved for use by the application (i.e. a memory allocation command was executed) -- doesn't necessarily mean it is actually being used. "Private working set" is the amount of memory that has been both allocated /and/ used by the process, that cannot be shared with other processes.
Warren
@Paul, ok, since I don't have an XP machine to check I don't know on which version it is available. Anyways, the thing is that only memory pages assigned to the application are counting as working set, pages which are swapped out or sparse don't get counted in. Windows seems to sometimes favor the cache over some swapping, which means that your application may be partially swapped out even though you think that there is free memory available.
Lucero
+1  A: 

The program you are running has limited resources thanks to the Visual Studio debugger trying to keep track of everything that you're doing in your application (breakpoints, references, the stack, etc.).

In addition to that, you might have more stuff that is still indisposed than you think-- the garbage collector is tiered, and collects large objects very slowly.

    +-------+
    | large |       collected less often (~1/10+ cycles)
  +-+-------+-+              |
  |   medium  |              |
+-+-----------+-+            V
|     small     |   collected more often (~1/3 cycles)
+---------------+

NOTE: The numbers are from memory, so take it with a grain of salt.

Tim
-1: What makes you think he's running in the debugger?
John Saunders
Let me clarify- running in Debug mode, not Release mode.
Tim