views:

392

answers:

3

I'm using boost sparse matrices holding bool's and trying to write a comparison function for storing them in a map. It is a very simple comparison function. Basically, the idea is to look at the matrix as a binary number (after being flattened into a vector) and sorting based on the value of that number. This can be accomplished in this way:

for(unsigned int j = 0; j < maxJ; j++)
{
  for(unsigned int i = 0; i < maxI; i++)
  {
    if(matrix1(i,j) < matrix2(i,j) return true;
    else if(matrix1(i,j) > matrix2(i,j) return false;
  }
}
return false;

However, this is inefficient because of the sparseness of the matrices and I'd like to use iterators for the same result. The algorithm using iterators seems straightforward, i.e. 1) grab the first nonzero cell in each matrix, 2) compare j*maxJ+i for both, 3) if equal, grab the next nonzero cells in each matrix and repeat. Unfortunately, in code this is extremely tedious and I'm worried about errors.

What I'm wondering is (a) is there a better way to do this and (b) is there a simple way to get the "next nonzero cell" for both matrices? Obviously, I can't use nested for loops like one would to iterate through one sparse matrix.

Thanks for your help.

--

Since it seems that the algorithm I proposed above may be the best solution in my particular application, I figured I should post the code I developed for the tricky part, getting the next nonzero cells in the two sparse matrices. This code is not ideal and not very clear, but I'm not sure how to improve it. If anyone spots a bug or knows how to improve it, I would appreciate some comments. Otherwise, I hope this is useful to someone else.

typedef boost::numeric::ublas::mapped_matrix<bool>::const_iterator1 iter1;
typedef boost::numeric::ublas::mapped_matrix<bool>::const_iterator2 iter2;

// Grabs the next nonzero cell in a sparse matrix after the cell pointed to by i1, i2.
std::pair<iter1, iter2> next_cell(iter1 i1, iter2 i2, iter1 end) const
{
 if(i2 == i1.end())
 {
  if (i1 == end)
   return std::pair<iter1, iter2>(i1, i2);
  ++i1;
  i2 = i1.begin();
 }
 else
 {
  ++i2;
 }

 for(; i1 != end;)
 {
  for(; i2 != i1.end(); ++i2)
  {
   return std::pair<iter1, iter2>(i1,i2);
  }
  ++i1;
  if(i1 != end) i2 = i1.begin();
 }
 return std::pair<iter1, iter2>(i1, i2);
}
A: 

(a) I don't fully understand what you're trying to accomplish, but if you want to compare if both matrices have the same value at the same index it's sufficient to use elementwise matrix multiplication (which should be implemented for sparse as well):

matrix3 = element_prod (matrix1, matrix2);

That way you'll get for each index:

0 (false) * 1 (true) = 0 (false)
0*0 = 0
1*1 = 1

So resulting matrix3 will have your solution in one line :)

count0
That is an interesting suggestion, but I need < to order the matrices. Essentially, what I need is a way to compute XOR on these matrices that uses the sparse matrix iterators and performs the XOR in the order j*maxJ+i so I can stop computing when I find a TRUE in the XOR knowing which matrix is "less".
scandido
A: 

It seems to me we're talking about implementing bitwise,elementwise operators on boost::sparse_matrix, since comparing if one vector (or matrix) is smaller than another without using any standard vector norms demands special operators (or special mappings/norms).

To my knowledge boost does not provide special operators for binary matrices (not to speak of sparse binary matrices). There are unlikely any straightforward solutions to this using BLAS level matrix/vector algebra. Binary matrices have an own place in the linear algebra field, so there are tricks and theorems but i doubt those are easier than your solution.

Your question could be reformulated as: How do i sort efficiently astronomically large numbers represented by a 2d-bitmap (n=100 so 100x100 elements would give you a number like 2^10000).

Good question !

count0
Yes, there is obviously not a magic bullet. The fact that the number of true entries in the matrix is sparse, so we can get a solution that runs in acceptable time (via the algorithm I mentioned). The problem is that the code for getting the next iterator for both matrices simultaneously is not necessarily clean, and I wonder if there is a good way to iterate through two sparse 2D matrices together.
scandido
+1  A: 

I like this question, by the way.

Let me pseudocode out what I think you're asking

declare list of sparse matrices ListA
declare map MatMAp with a sparse Matrix type mapping to a double, along with a
`StrictWeakMatrixOrderer` function which takes two sparse matrices.

Insert ListA into MatMap.

The Question: How do I write a StrictWeakMatrixOrderer efficiently?

This is an approach. I'm inventing this on the fly....


Define a function flatten() and precompute the flattened matrices, storing the flattened vectors in a vector(or another container with a random indexing upper bound). flatten() could be as simple as concatenating each row(or column) with the previous one(which can be done in linear time if you have a constant-time function to grab a row/column).

This yields a set of vectors with size on the order of 10^6. This is a tradeoff - saving this information instead of on-the-fly computing it. This is useful if you're going to be doing a lot of compares as you go along.

Remember, zeros contain information - dropping them will possibly yield two vectors equal to each other, whereas their generating matrix may not be equal.

Then, we have transformed the algorithm question from "order matrices" into "order vectors". I've never heard of a distance metric for matrices, but I've heard of distance metrics for vectors.

You could use a "sum of differences" ordering aka Hamming distance. (foreach element that's different, add 1). That will be a O(n) algorithm:

for i = 0 to max.
  if(a[i] != b[i])
     distance++

return distance

The Hamming distance satisfies these conditions

d(a,b) = d(b,a)
d(a,a) = 0
d(x, z) <= d(x, y) + d(y, z)

Now to do some off-the-cuff analysis....

  • 10^6 elements in a matrix(or its corrosponding vector).
  • O(n) distance metric.
    • But it's O(n) compares. If each array access has O(m) time, then you would have an O(n*(n+n)) = O(n^2) metric. So you have to have < O(n) access. It turns out that std::vector [] operator provides "amortized constant time access to arbitrary elements" according to SGI's STL site.
  • Providing you have sufficient memory to store k*2*10^6, where k is the number of matrices you are managing, this is a working solution that uses lots of memory in exchange for being linear.
Paul Nathan
The key type is the sparse matrix and that maps to a double. (The application is related to a probability distribution over the sparse matrices.) Otherwise, your assessment is correct. Regarding your question, no it does not matter so long as it implements a strict weak ordering and only identical matrices are equivalent.
scandido
How big is a matrix? is the widest side 10^3? 10^5? What % of the matrix is zeros?
Paul Nathan
The size of the matrix varies, but typically it is from 10^2 to 10^3. The sparseness of the matrix varies quite a bit as the matrix represents a state of a diffusion process.
scandido
This is a viable alternative, but in my particular application less desirable than the initial solution I proposed because of the memory usage. Thank you for the suggestion, someone else approaching this problem may benefit from your answer quite a bit.
scandido