views:

108

answers:

2

Hi, I wrote a small sparse matrix class with the member:

std::map<int,std::map<int,double> > sm;

The method below is the function i use to access the elements of a matrix, if not possible through an iterator:

double matrix::operator()(int r,int c) const
{
    std::map<int,std::map<int,double> >::const_iterator i = sm.find(r);
    if(i==sm.end()) { return 0.0; }
    std::map<int,double>::const_iterator j = i->second.find(c);
    if(j==i->second.end()) { return 0.0; }
    return j->second;
}

Still this function needs to get called very often. Has somebody an idea how to improve this function? Thanks ahead.

+3  A: 

If you are to write your own code rather than using a library, then this change may improve performance dramatically:

std::map<std::pair<int,int>, double> sm;

To increase farther you can go to hashed containers:

std::unordered_map<std::pair<int,int>, double> sm;

(use tr1, boost or C++0x)

EDIT: in the first case you can iterate over row like this:

for(auto& cell : make_pair(
    sm.lower_bound(make_pair(row, 0)), sm.lower_bound(make_pair(row+1, 0))))
{ ... }

I think that you can do the above by one call to equal_range with an appropriate functor if you use Boost.MultiIndex.

over everything:

for(auto& cell : sm)
{ ... }

to iterate over a column you need to search for every cell separately. Note that your datastructure doesn't provide this operation either.

ybungalobill
Thanks. I got one question though: wouldn't it be in the maps you suggest be more complicated to iterate through rows and columns?
litro
@litro: Not in the first one. To iterate over a row you just call equal_range with beginning and end of this row and get a pair of iterators to the beginning and end of this row. Both this and yours method doesn't provide an easy way to iterate over columns.
ybungalobill
I will change the map type to one of above suggested. I already tested using the unordered map on my type which performed worse. Though about my accessor: Is this already optimal or can this specific accessor be improved?
litro
@litro: Your accessor is optimal for your data-structure.
ybungalobill
Would you mind giving an example for iterating over all rows? I seem to have a case of the stupids.
litro
@litro: updated.
ybungalobill
Thanks Again. That helped.
litro
A: 

You're probably going to hate this, but for the following (small for display purposes) row of a matrix:

1 0 0 5 9 0 0 0

you could have a bit array (8 bits in this case) where for each bit was set or cleared to reflect if there was a non-zero or zero at that position in the matrix.

Then you just store the non-zero numbers in a regular array packed together like:

{ 1, 5, 9 }

and the binary flags

0x98 // binary 1001 1000

And to iterate through the matrix row you would just use bit operations to loop through the bit array:

while (! /* not at the end of the bit array */ ) {
    f = get_next_from_bit_array();  // This is just bitwise shift and bitwise & 
    if (!f) {
       val = 0;
    } else {
       val = compressed_row[i];
       i++;
    }
    do_action(val);
}

My code here is just for demonstration and isn't very C++, but I hope you can get the idea.

Using the bit array will allow you to examine a much smaller area of memory for sparse rows, which means fewer memory accesses and better cache locality.

If the matrixes you are working with are extremely sparse then you could extend this for the other dimension as well (having sparse array of rows), but the chance that you have entire rows empty is probably low.

nategoose
This is an interesting approach, but has poor scaling characteristics compared to the original. Ideally, the storage for a sparse matrix does not increase if the size of the matrix increases but the number of non-zero elements stays constant. His approach does have that characteristic but yours doesn't.
Reinderien
This does scale decently since only the bit arrays grow as that dimension grows with new 0 elements, and this growth is very small. I chose to suggest this after reading a comment to another suggestion that mentioned the importance of iterating through the matrix. There are several other ways to implement this that work better for different levels of sparsity -- list with (_index_, _value_) comes to mind.
nategoose