ansaurus

Question

tracking if a result has already been computed using hash table

Answer 1

+1 A:

You have to be very careful with hashes. If you have a collision (same hash value for different original values) you may end up with wrong results. Are you sure that calculating hash of a matrix will be much more efficient than performing the actual operations (it all depends on the number/complexity of these operations obviously)

Second concern - you haven't said anything about your cache eviction policy. Are you going to just add to the hash table without removing? Depending on the number of different matrices you potentially may run out of memory...

DmitryK 2009-08-17 03:00:02

Thanks for the quick response. These two are the key questions. (a) Time: I am assuming that matrix-matrix multiplication for even a small matrix (4 by 4) will be more expensive than calculating the hash-function. (b) Space: I am assuming that the overhead of storing an entry in a hash-table will be smaller than storing multiple copies of the same matix.Are these assumptions reasonable? I really do not have a sense about the space/time overhead of hash-tables.Would "minimally perfect hash" be appropriate?

2009-08-17 03:13:51

2009-08-17 03:21:01

correct, hash value obviously takes less space than the matrix itself. btw there are some good comments here:http://stackoverflow.com/questions/934827/hash-function-to-matrix

DmitryK 2009-08-17 15:42:10

Answer 2

A:

Answering the easy part first: For a C++ library of matrix operations have a look at newmat which has a large range of functionality built-in, and performance-wise is pretty efficient.

For your specific case of building a hash to speed up computations - caching is only worthwhile unless you are going to be performing operations on a very limited set of matrices. To build a unique hash for the matrix you will need to visit every entry - and calculate the hash based on each entry's location and value. Worse still, matrices are not always commutative, e.g. A*B != B*A except in special cases.

This means your cache would have to store one entry for each specific calculation. So unless you're dealing with a very small range of input matrices the memory cost of holding all the results will be huge.

For very small matrices or column/row vectors, the marginal overhead of the full computation vs. calculating a hash is tiny... so caching will offer little extra benefit unless you are doing so many calculations that fractions of a millisecond difference in time will accumulate enough to make a difference.
For very large matrices, it's possible that you could see a benefit from caching if your possible input matrices are very limited. If they could be anything, the likely benefit is outweighed by the rarity of getting a repeat strike on the cache, as well as the memory cost and complexity of managing the cache.

Caching would speed up the results, but only in a very limited set of circumstances.

Given that you've asked for advice on a library as well, this sounds like a case of premature optimisation. I would implement your program without caching, performance profile it, and if you find a performance bottleneck in the array arithmetic, then consider ways to optimise the number crunching.

Edit: on calculating the hash: if you have a n-by-m matrix X, then calculating the hash for that one matrix is at least as complicated as the operation R*X*C where R is the row vector [1,..,n] and C is the column vector [1,..,m]. I haven't worked out the optimal payoff, but for really small matrices on the order of 2x2, 3x3 doing the raw calc is going to be cheaper than calculating the hash.

nasbbig 2009-08-17 03:41:04

Good point. Yes, the total number of different matrices will be very small, as compared to the number of repeat computations. I will give it a try and see how much difference it makes for different sizes of matrices and numbers of distinct calculations.I am intrigued by your last comment, I was only thinking about using pointer to the rightOperand matrix as the key and pointer to the result matrix as the value. Can a pointer be used as a key -treating it as an `int`?? Thanks.

2009-08-17 04:03:18

Answer 3

A:

This feature is called memoization - see wikipedia article for details.

This article also mentions some libraries that can help you.

qrdl 2009-08-17 06:29:20

That is exactly what I had in mind. Thanks for the pointer.

2009-08-17 17:28:06

@unknown That's what upvotes and accepted answers for :)

qrdl 2009-08-18 06:04:30

ansaurus

tags:

views:

answers:

tracking if a result has already been computed using hash table

related questions