tags:

views:

283

answers:

1

My goal is to create an efficient structure to store the most relevant entries of a matrix that would (in a world without memory limitations) be approximately 10^5 x 10^5 and filled with doubles. The matrix is symmetric, so it actually would contain only (10^10)/2 values.

I need to access entries many, many times in my simulation, so fast retrieval is critical.

To keep the structure manageable, I will delete members that are unlikely to be used. If the index is (int_x1, int_x2), I will often want to delete all pairs containing, e.g., x1.

What is the best structure or set of structures for this task? What is a good hash for two ints?

For portability, I would like to avoid Boost. I am currently using TR1's unordered_map elsewhere in the program. I was thinking of using unordered_map again with the key pair, but I'm not sure how I would be able to delete entries efficiently this way, and I don't know what a good hash function would look like.

I'm a beginning programmer, so please state the obvious.

+1  A: 

If the data is going to be pretty sparse, you could use an array of hash tables.

hash_map<int,double> matrix[] = new hash_map<int,double>[10000];
for (int i = 0; i < 10000; i++) matrix[i] = new hash_map<int,double>();

Then to look up a value (x,y), you index the array with x and look up y in the hash table.

A few things to watch out for:

  • Deleting can get pretty expensive, as you have to iterate through a lot of the hash tables.
  • Total storage can grow as you delete/insert, you should trim() your hash_maps occasionally.
  • it should be easy to take advantage of symmetry.
Keith Randall
This makes sense, though it might be better for me to make a vector of hash tables, since I don't really know the size of the array in advance. Is there a reason not to have a hash table of hash tables?
Sarah
Could you also explain what you mean by trim()? That doesn't seem to be a member function of TR1's unordered_map or any other hash map I've found. I currently have one hash table for the x1 index, where x1 > x2. Each of those entries points to a separate hash table holding data for all x2 < x1. When "m" becomes obselete, I delete its hash table and then iterate over all other x1 that are greater than m, using find() and erase() to remove references to m. What additional operations would be useful for memory management?
Sarah
A hash table of hash tables would be fine, but kind of overkill as the indexes in 1 dimension will almost certainly be dense. Yes, a vector would be fine.Sorry, trim is general concept, not a specific function. Most implementations auto-grow the hash table on insert, but don't auto-shrink the hash_map on delete. You might want to trim your hash_maps periodically to save some memory (depending on your insert/delete pattern). There's no method for this in c++, but if size() is much less than bucket_count(), just copy the data to a new hash_map and delete the old one.
Keith Randall