We have a program that uses both boost's matrix and and sparse matrix libraries and we are attempting to integrate boost threads. However, when we move from a single-threaded to multi-threaded application we are getting segmentation faults that do not occur in the single-thread case.
We've debugged using gdb (in Eclipse) and I've found that the seg faults are occurring during function calls in the boost code, i.e. I try accessing an entry of a sparse matrix and the stack trace goes into boost code and dies at some point (not always the same point) in those files.
I'm confused because these matrices are allocated within the individual threads and all shared resources are protected by mutex locks. Furthermore, I didn't think threading often caused seg faults, just multiple accesses and bad data. However, I'm obviously not experienced with multi-threaded programming so I was hoping that someone who is more experienced in this area might be able to provide some advice.
I'm using the boost managed make system and an example compilation command is
g++ -DNDEBUG -I"/usr/local/include/boost-1_38" -I/usr/local/include -I"/home/scandido/workspace/BigSHOT/src" -I"/home/scandido/workspace/BigSHOT/src/base" -O3 -Wall -c -fmessage-length=0 `freetype-config --cflags` -pthread -MMD -MP -MF"src/utils/timing_info.d" -MT"src/utils/timing_info.d" -o"src/utils/timing_info.o" "../src/utils/timing_info.cpp"
and the linker command is
g++ -L"/usr/local/lib" -L/usr/local/lib -o"BigSHOT" ./src/utils/timing_info.o ... many more objects ... ./src/base/pomdp/policy_fn/EventDriven.o ./src/base/pomdp/policy_fn/Greedy.o ./src/anotheralgorithm.o -lboost_serialization-gcc43-mt -lpthread -lboost_thread-gcc43-mt -lboost_program_options-gcc43-mt -lboost_iostreams-gcc43-mt -lpng -lpngwriter -lz -lfreetype
Here is a stack trace for the thread that seg faults:
Thread [5] (Suspended: Signal 'SIGSEGV' received. Description: Segmentation fault.)
17 boost::numeric::ublas::mapped_matrix<bool, boost::numeric::ublas::basic_row_major<unsigned long, long>, boost::numeric::ublas::map_std<unsigned long, bool, std::allocator<std::pair<unsigned long const, bool> > > >::operator() /usr/local/include/boost-1_38/boost/numeric/ublas/matrix_sparse.hpp:377 0x000000000041c328
16 BigSHOT::Fire1FireState::get_cell() /home/scandido/workspace/BigSHOT/src/systems/fire1/pomdp/Fire1State.cpp:51 0x0000000000419a75
15 BigSHOT::Fire1SquareRegionProbObsFn::operator() /home/scandido/workspace/BigSHOT/src/systems/fire1/obs_fn/Fire1SquareRegionProbObsFn.cpp:92 0x000000000042ac37
14 BigSHOT::Fire1SquareRegionProbObsFn::operator() /home/scandido/workspace/BigSHOT/src/systems/fire1/obs_fn/Fire1SquareRegionProbObsFn.cpp:66 0x000000000042a8bf
13 BigSHOT::BayesFilterFn<BigSHOT::Fire1Belief, BigSHOT::Fire1State, BigSHOT::Fire1Action, BigSHOT::Fire1Observation>::update() /home/scandido/workspace/BigSHOT/src/base/pomdp/filter_fn/BayesFilterFn.h:50 0x0000000000445c3b
12 BigSHOT::HyperParticleFilter<BigSHOT::Fire1Belief, BigSHOT::Fire1Action, BigSHOT::Fire1Observation>::future_evolution() /home/scandido/workspace/BigSHOT/src/base/hpf/HyperParticleFilter.h:127 0x00000000004308e0
11 BigSHOT::HyperParticleFilter<BigSHOT::Fire1Belief, BigSHOT::Fire1Action, BigSHOT::Fire1Observation>::hyperfilter() /home/scandido/workspace/BigSHOT/src/base/hpf/HyperParticleFilter.h:86 0x000000000043149b
10 BigSHOT::HyperParticleFilterSystem<BigSHOT::HyperCostFn<BigSHOT::Fire1Belief, BigSHOT::Fire1Action>, BigSHOT::PolicyFn<BigSHOT::Fire1Belief, BigSHOT::Fire1Action>, BigSHOT::Fire1Belief, BigSHOT::Fire1Action, BigSHOT::Fire1Observation>::next_stage() /home/scandido/workspace/BigSHOT/src/base/hpf/HyperParticleFilter.h:189 0x0000000000446180
9 hyperfilter() /home/scandido/workspace/BigSHOT/src/anotheralgorithm.cpp:126 0x0000000000437798
8 hf_thread_wrapper() /home/scandido/workspace/BigSHOT/src/anotheralgorithm.cpp:281 0x0000000000437cd9
7 boost::_bi::list1<boost::_bi::value<int> >::operator()<void (*)(int), boost::_bi::list0>() /usr/local/include/boost-1_38/boost/bind.hpp:232 0x000000000043f25a
6 boost::_bi::bind_t<void, void (*)(int), boost::_bi::list1<boost::_bi::value<int> > >::operator() /usr/local/include/boost-1_38/boost/bind/bind_template.hpp:20 0x000000000043f298
5 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(int), boost::_bi::list1<boost::_bi::value<int> > > >::run() /usr/local/include/boost-1_38/boost/thread/detail/thread.hpp:56 0x000000000043f2b6
4 thread_proxy() 0x00007f28241c893f
3 start_thread() 0x00007f28243d93ba
2 clone() 0x00007f2822a4ffcd
1 <symbol is not available> 0x0000000000000000
The line of code where it happens is the second of this block:
const_reference operator () (size_type i, size_type j) const {
const size_type element = layout_type::element (i, size1_, j, size2_);
const_subiterator_type it (data ().find (element));
I'd like to reiterate that the seg fault doesn't always occur at the same place in the code, but always when executing something in the boost code.
Thanks in advance for your help!