views:

115

answers:

2

I'm having trouble debugging a segmentation fault. I'd appreciate tips on how to go about narrowing in on the problem.

The error appears when an iterator tries to access an element of a struct Infection, defined as:

struct Infection {
public:
  explicit Infection( double it, double rt ) : infT( it ), recT( rt ) {}
  double infT; // infection start time
  double recT; // scheduled recovery time
};

These structs are kept in a special structure, InfectionMap:

typedef boost::unordered_multimap< int, Infection > InfectionMap;

Every member of class Host has an InfectionMap carriage. Recovery times and associated host identifiers are kept in a priority queue. When a scheduled recovery event arises in the simulation for a particular strain s in a particular host, the program searches through carriage of that host to find the Infection whose recT matches the recovery time (double recoverTime). (For reasons that aren't worth going into, it's not as expedient for me to use recT as the key to InfectionMap; the strain s is more useful, and coinfections with the same strain are possible.)

assert( carriage.size() > 0 );
pair<InfectionMap::iterator,InfectionMap::iterator> ret = carriage.equal_range( s );
InfectionMap::iterator it;
for ( it = ret.first; it != ret.second; it++ ) {
  if ( ((*it).second).recT == recoverTime ) { // produces seg fault
    carriage.erase( it );
  }
}

I get a "Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address..." on the line specified above. The recoverTime is fine, and the assert(...) in the code is not tripped.

As I said, this seg fault appears 'randomly' after thousands of successful recovery events.

How would you go about figuring out what's going on? I'd love ideas about what could be wrong and how I can further investigate the problem.


Update

I added a new assert and a check just inside the for loop:

assert( carriage.size() > 0 );
assert( carriage.count( s ) > 0 );
pair<InfectionMap::iterator,InfectionMap::iterator> ret = carriage.equal_range( s );
InfectionMap::iterator it;
cout << "carriage.count(" << s << ")=" << carriage.count(s) << endl;
for ( it = ret.first; it != ret.second; it++ ) {
  cout << "(*it).first=" << (*it).first << endl; // error here
  if ( ((*it).second).recT == recoverTime ) {
    carriage.erase( it );
  }
}

The EXC_BAD_ACCESS error now appears at the (*it).first call, again after many thousands of successful recoveries. Can anyone give me tips on how to figure out how this problem arises? I'm trying to use gdb. Frame 0 from the backtrace reads

"#0 0x0000000100001d50 in Host::recover (this=0x100530d80, s=0, recoverTime=635.91148029170529) at Host.cpp:317"

I'm not sure what useful information I can extract here.


Update 2

I added a break; after the carriage.erase(it). This works.

A: 

Compile the program with gcc -g and run it under gdb. When you get an EXC_BAD_ACCESS crash you'll drop into the gdb command line. At that point you can type bt to get a backtrace, which will show you how you got to the point where the crash occurred.

Paul R
I've been staring at the backtrace for an hour. It points to the line I noted above. I'll read more about how to decipher the addresses.
Sarah
+6  A: 

Correct me if I'm wrong but I would bet that erasing an item in an unordered multimap invalidates all iterators pointing into it. Try "it = carriage.erase(it)". You'll have to do something about ret as well.

Update in reply to your latest update:

The reason breaking out of the loop after calling "carriage.erase(it)" fixed the bug is because you stopped trying to access an erased iterator.

Noah Roberts
+1, except that `erase` does not return the next iterator, you have to keep track of that yourself.
avakar
That and don't keep incrementing it in the for loop.
SB
@Noah and @avakar: Does your advice hold if there's always only one Infection with the specific recovery time? (I've confirmed that this is the case when the seg fault appears.) Using "it=carriage.erase(it)" creates a new seg fault that appears immediately in the simulation.
Sarah
@SB: Could you explain why I shouldn't increment it in the for loop?
Sarah
@sarah - you need to increment it only when you don't erase. Otherwise you're already at the next one. I've never used the unordered_xxx pieces of boost so I was mostly guessing based on how Meyers described the implementation. Unordered_xxx are vectors of vectors or something similar so I expect them to behave as such. For example: if you're iterating to a destination and you delete objects between here and there...there moves and your pointer to there is no longer pointing at there.
Noah Roberts
Based on the boost documentation you should be able to grab direct pointers to your rev contents and your algorithm will hold. Iterators can be invalidated but apparently pointers and references are never: http://www.boost.org/doc/libs/1_43_0/doc/html/unordered/comparison.html
Noah Roberts