views:

189

answers:

7

I am working on a .Net 3.5 application designed specifically for a high-powered PC that does lots of data manipulation and computation. I recently encountered a need for a 4000 x 5000 two-dimensional array of objects, which is very large for a 32-bit PC and will give me an OutOfMemoryException. The only way to avoid using an array like this is by going down a very complex, time-consuming road filled with pain and suffering.

Are there any tips or tricks that the pros use to deal with large working sets of RAM? Do you know of any libraries that would be helpful (specifically for .Net)? Is there a way to force Windows to allocate more RAM for my process?

EDIT: The array I am using will contain mostly null references, and I am using the array to keep track of adjacent objects. Seeing how most of them are null references, I would also assume there is a more efficient approach to keeping track of adjacent objects, finding a neighbor for any given object, etc.

+1  A: 

Well, one thought is to scrap the two-dimensional array for a database instead. Something like SQLite has a small footprint and can be easily deployed with the application. There's even a C# wrapper for it.

SQLite will read this data from a single file. And so the reads and writes from the disk could take a performance hit. Although how much of a performance hit may depend on the nature of the application. Lookups via an index should be fast, for example. But massive calculations across the entire database would definitely be slower. So... I don't know but maybe it's worth considering.

Steve Wortham
I'm not very experienced with databases, though I can figure it out. I am using the array to keep track of adjacent objects; is it very easy (and efficient) to do that sort of thing with a database?
Phil
Yes, it definitely can be. The database engine is capable of indexing your data. That index provides a super fast method of finding a record, even despite the fact that it's stored in the file system. There's a speed comparison page here describing several scenarios with SQLite... http://www.sqlite.org/speed.html
Steve Wortham
Why do people seem to think that RDBMSes are the solution to everything?
Nick Johnson
@Nick - I certainly don't. It was just my first idea since it has the advantage of drastically reducing memory consumption. If by using something like a Generic Dictionary you're able to reduce memory consumption enough for the application to be usable then perhaps that's the better solution.
Steve Wortham
@Nick - I also answered this question before we found out that most of the elements in the array are null.
Steve Wortham
@Nick they aren't the solution for everything, just the things you can't use regular expressions for.
Pete Kirkham
@Steve: My point is that switching from a purely memory based, and fairly straightforward algorithm to using an RDBMS is going to add orders of magnitude to the runtime. Hence why it's such a poor idea.
Nick Johnson
+6  A: 

Judging by your comments, I think I can now answer your question. If most of the references are null then you can hash the keys into a table that in turn point to your elements. There is constant time O(1) loopup time in a hash map and you wont have to worry about key collisions because each [x,y] pair is unique. You also wont have to worry about memory collisions since most of the references are null.

Chris H
This is a good idea... I may try this.
Phil
I presume you mean he won't have to worry about the keys being non-unique. He'll still have to worry about collisions in the hashtable.
Nick Johnson
If you're going this route I'd check out the Generic Dictionary class here: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx It's using a Hashtable behind the scenes so lookups should be fast. And since it's generic you can strongly type both the key and the value.
Steve Wortham
A: 

Is the array fixed? i.e. the values in the array does not change...it might be worth it to dump the array contents to disk and use memory mapping technique instead, and you can then load a portion of the dumped array into a memory map for reading...otherwise it will not do if the data and elements in the array changes...

just my 2cents...

Hope this helps, Best regards, Tom.

tommieb75
But memory mapping still takes up address space and is still therefore limited to 2-4 GB on 32-bit.
dsimcha
@dsimcha: that's true, but you can specify a portion of it to load in memory
tommieb75
A: 

There are 2 'easy' directions, at the OS or Process level.

  1. Add the /3GB switch to your boot.ini, and modify your app to use /LARGEADDRESSAWARE. You immediately gain an additional 1G of virtual address space, but not without a tradeoff. Good chance it's the right choice for you.
  2. Often the problem is not lack of memory, but rather its fragmentation - seems relevant to your context too (huge consecutive arrays). A while back I had put online some techniques that helped me combat fragmentation for native code - should be at least partially applicable to managed.
Ofek Shilon
+1  A: 

You can efficiently store a grid-like structure where the majority of elements are null in a sparse array. They can be implemented in different ways, but commonly use modified linked lists for the rows and columns. There is a good introduction to the subject here.

Jon
+1  A: 

If the majority of your elements are null, then perhaps you don't really need to create an array at all.

Jon suggests one approach that'll work - implementing a sparse array using linked lists. Here's another:

public struct CellLocation
{
   int Row;
   int Column;
}

public class Element
{
   public Element(int row, int column)
   {
      Location = new CellLocation {Row = row, Column=column};
   }

   public readonly Location { get; private set; }

   // your class's other properties and methods go here
}

Now you can store Element objects in a Dictionary<CellLocation, Element>. In fact, I'd put that dictionary into a class of its own, so that it can implement methods like:

public IEnumerable<Element> AdjacentElements(Element elm)
{
   for (int row = -1; row <= 1; row++)
   {
      for (int column = -1; column <= 1; column++)
      {
         // elm isn't adjacent to itself
         if (row == 0 && column == 0)
         {
            continue;
         }
         CellLocation key = new CellLocation { 
            Row=elm.Location.Row + row, 
            Column=elm.Location.Column + column 
         };
         if (!Cells.ContainsKey(key))
         {
            continue;
         }
         yield return Cells[key];
      }
   }
}

There are operations for which this can be faster than a sparse array. To find the element at a single row and column, a sparse array still has to do a linear search to find the row, and then another linear search to find the column in that row, whereas this method can find an element with one lookup into a hash table.

There are also circumstances in which it will be substantially slower. To find all the elements in a row requires as many hash-table lookups as there are cells in the row, while doing it with a sparse array just entails traversing a linked list.

Robert Rossney
This is eerily similar to the solution I have right now. +1.
Phil
You are going to need to override CellLocation.GetHashCode(), I think.
Phil
A: 

It looks like what you are actually doing is an adjacency matrix. If this is the case, and the underlying graph is sparse, then it would probably be better to switch to an adjacency list. http://en.wikipedia.org/wiki/Adjacency_list

Joel