views:

84

answers:

2

I've got a strange bug that I'm hoping a more experience programmer might have some insight into. I'm using the boost ublas sparse matrices, specifically mapped_matrix, and there is an intermittent bug that occurs eventually, but not in the initial phases of the program. This is a large program, so I cannot post all the code but the core idea is that I call a function which belongs to a particular class:

bool MyClass::get_cell(unsigned int i, unsigned int j) const
{
    return c(i,j);
}

The variable c is defined as a member of the class

boost::numeric::ublas::mapped_matrix<bool> c;

When the bug occurs, the program seems to stop (but does not crash). Debugging with Eclipse, I can see that the program enters the boost mapped_matrix code and continues several levels down into std::map, std::_Rb_tree, and std::less. Also, the program occasionally traces down to std::map, std::_Rb_tree, and std::_Select1st. While code is executing and the active line what's in memory changes in _Rb_tree, execution never seems to return in the level of std::map. The line in std::map the program is stuck on is the return of the following function.

const_iterator
find(const key_type& __x) const
{ return _M_t.find(__x); }

It seems to me that there is some element in the c matrix that the program is looking for but somehow the underlying storage mechanism has "misplaced it". However, I'm not sure why or how to fix it. That could also be completely off base.

Any help you can provide would be greatly appreciated. If I have not included the right information in this question, please let me know what I'm missing. Thank you.

A: 

Is this part of a multithreaded program?

I ask, because usually when I see problems in STL, it ends up being a problem with unsynchronized access.

Managu
I was initially trying to use multithreading, but the same problem exists when the program is run within a single thread of execution.
scandido
+1  A: 

Some things to try to debug the code (not necessarily permanent changes):

  • Change the bool to an int in the matrix type for c, to see if the matrix expects numeric types.
  • Change the matrix type to another with a similar interface, possibly plain old matrix.
  • Valgrind the app (if you're on linux) to check you're not corrupting memory.

If that fails, you could try calling get_cell every time you modify the matrix to see what might be causing the problem.

Failing that, you may have to try reduce the problem to a much smaller subset of code which you can post here.

It might help if you tell us what compiler and OS you're using.

jon hanson
Thank you for your suggestions. I tried changing from a bool to an int and the same problem occurs. I changed to a plain matrix and the problem goes away, but it is too slow to actually use without using a sparse matrix. I also tried calling get_cell but there are many places where the matrix is modified and I'm not confident that I found them all. Unfortunately, I've had to use a work around and put this issue on the backburner for a while. Thank you for your suggestions and if I'm able to determine the problem at a later time I will post it.
scandido
Finding leaked memory with valgrind solved the problem. Thank you for your help.
scandido