views:

258

answers:

3

(Note to any future readers: The error, unsurprisingly, is in my code and not std::_Rb_tree_rebalance_for_erase () )

I'm somewhat new to programming and am unsure how to deal with a segmentation fault that appears to be coming from a std function. I hope I'm doing something stupid (i.e., misusing a container), because I have no idea how to fix it.

The precise error is

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
(gdb) backtrace
#0 0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
#1 0x000000010000e593 in Simulation::runEpidSim (this=0x7fff5fbfcb20) at stl_tree.h:1263
#2 0x0000000100016078 in main () at main.cpp:43

The function that exits successfully just before the segmentation fault updates the contents of two containers. One is a boost::unordered_multimap called carriage; it contains one or more struct Infection objects. The other container is of type std::multiset< Event, std::less< Event > > EventPQ called ce.

void Host::recover( int s, double recoverTime, EventPQ & ce ) {

  // Clearing all serotypes in carriage
  // and their associated recovery events in ce
  // and then updating susceptibility to each serotype
  double oldRecTime;
  int z;
  for ( InfectionMap::iterator itr = carriage.begin(); itr != carriage.end(); itr++ ) {
    z = itr->first;
    oldRecTime = (itr->second).recT;
    EventPQ::iterator epqItr = ce.find( Event(oldRecTime) );
    assert( epqItr != ce.end() );
    ce.erase( epqItr );
    immune[ z ]++; 
  }
  carriage.clear();
  calcSusc(); // a function that edits an array 
  cout << "Done with sync_recovery event." << endl;
}

The last cout << line appears immediately before the seg fault.

My idea so far is that the rebalancing is being attempted on ce immediately after this function, but I am unsure why the rebalancing would be failing.


Update

I've confirmed the seg fault goes away (though the program then immediately crashes for other reasons) when I remove ce.erase( epqItr );. I am able to remove events successfully in another place in the code; the code I use there to erase items in ce is identical to what's here.

Backtracing without optimization (thanks, bdk) reveals much more information:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000000000000c
0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
(gdb) backtrace
#0 0x00007fff8062b144 in std::_Rb_tree_rebalance_for_erase ()
#1 0x00000001000053d2 in std::_Rb_tree, std::less, > std::allocator >::erase (this=0x7fff5fbfdfe8, __position={_M_node = 0x10107cb50}) at > stl_tree.h:1263
#2 0x0000000100005417 in std::multiset, std::allocator >::erase (this=0x7fff5fbfdfe8, __position={_M_node = 0x10107cb50}) at stl_multiset.h:346 #3 0x000000010000ba71 in Simulation::runEpidSim (this=0x7fff5fbfcb40) at Simulation.cpp:426
#4 0x000000010001fb31 in main () at main.cpp:43

Unless XCode is reading line numbers wrong, the only stl_tree.h in my hard drive is blank on line 1263.

A few people asked to see the function that calls recover. It's a bit complicated:

struct updateRecovery{
updateRecovery( int s, double t, EventPQ & ce ) : s_(s), t_(t), ce_(ce) {}
  void operator() (boost::shared_ptr<Host> ptr ) {
   ptr->recover( s_, t_, ce_ );
  }
private:
  int s_;
  double t_;
  EventPQ & ce_;
};

// allHosts is a boost::multiindex container of boost::shared_ptr< Host > 
// currentEvents is the EventPQ container
// it is an iterator to a specific member of allHosts
allHosts.modify( it, updateRecovery( s, t, currentEvents ) );
cout << "done with recovery" << endl;

The last cout prints. The code worked before without this particular version of the recovery function.

Noah Roberts correctly pointed out that the problem is at Simulation.cpp, line 426. Jump below for embarrassing solution.

+2  A: 

Possibly you're holding onto an iterator into ce across the call to recover. If recover happens to remove that item the iterator will be invalidated and any future use (say an attempt to erase it) could result in a seg fault.

It would help if we could see more context of how ce is used before and after the call to recover.

Mark B
You're right! (I need to look into how and when tree rebalancing happens.)
Sarah
A: 

Perhaps the call to assert is not compiled with your configuration. Assertions in production code are usually a Bad Idea[TM].

You could also be exceeding immune's boundaries.

Try:

    if (epqItr != ce.end()) 
    {
        ce.erase(epqItr);
        if (z is within immune's bounds)
        {
            ++immune[z]; 
        }
    }
Johnsyweb
A: 

The problem was that on line 426 of Simulation.cpp, I tried to delete an event in the EventPQ currentEvents (a.k.a. ce) container that my recover() function had just deleted. The iterator had obviously been invalidated. Dumb.

Lessons:

  • Debug on code that has not been optimized
  • Pay close attention to what the non-std related frames imply

And for the future: Trace memory in valgrind

I'm still stumped why the debugger referred me to an apparently blank line in stl_tree.h.

I've massive appreciation here for the people who have helped me work through this. I'm going to revise my question so it's more concise for any future readers.

Sarah
That's a general rule to follow when you run into a crash. Look at the callstack from the top down until you hit your own code and start looking there. Generally you'll find that the bug is caused somewhere in that stacktrace within your own code, not the library. When that's not the case you'll usually find you hit UB a half hour ago and are just now running into whatever that caused. I've actually never run into a library bug. Run into a LOT of compiler bugs but I can't think of any library bug.
Noah Roberts