I have a program that gets a KERN_PROTECTION_FAILURE with EXC_BAD_ACCESS in a very strange place when running multithreaded and I haven't the faintest idea how to troubleshoot it further. This is on MacOS 10.6 using GCC.
The very strange place that it gets this is when entering a function. Not on the first line of the function, but the actual jump to the function GetMachineFactors():
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb00009ec
[Switching to process 28242]
0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
168 MachineFactors* GetMachineFactors()
(gdb) bt
#0 0x00012592 in GetMachineFactors () at ../sysinfo/OSX.cpp:168
#1 0x000156d0 in CollectMachineFactorsThreadProc (parameter=0x200280) at Threads.cpp:341
#2 0x952f681d in _pthread_start ()
#3 0x952f66a2 in thread_start ()
(gdb)
If I run this non-threaded, it runs great, no issues:
#include "MachineFactors.h"
int main( int argc, char** argv )
{
MachineFactors* factors = GetMachineFactors();
std::string str = CreateJSONObject(factors);
cout << str;
delete factors;
return 0;
}
If I run this in a pthread, I get the EXC_BAD_ACCESS above.
THREAD_FUNCTION CollectMachineFactorsThreadProc( LPVOID parameter )
{
Main* client = (Main*) parameter;
if ( parameter == NULL )
{
ERRORLOG( "No data passed to machine identification thread. Aborting." );
return 0;
}
MachineFactors* mfactors = GetMachineFactors(); // This is where it dies.
// If I don't call GetMachineFactors and do something like mfactors =
// new MachineFactors(); everything is good and the threads communicate and exit
// normally.
if (mfactors == NULL)
{
ERRORLOG("Failed to collect machine identification: GetMachineFactors returned NULL." << endl)
return 0;
}
client->machineFactors = CreateJSONObject(mfactors);
delete mfactors;
EVENT_RAISE(client->machineFactorsEvent);
return 0;
}
Here is an excerpt from the GetMachineFactors() code:
MachineFactors* GetMachineFactors() // Dies on this line in multi-threaded.
{
// printf( "Getting machine factors.\n"); // Tried with and without this, never prints.
factors = new MachineFactors();
factors->OSName = "MacOS";
factors->Manufacturer = "Apple";
///…
// gather various machine metrics here.
//…
return factors;
}
For reference, I am using a socketpair to wait on the thread to complete:
// From the header file I use for cross-platform defines (this runs on OSX, Windows, and Linux.
struct _waitt
{
int fds[2];
};
#define THREAD_FUNCTION void*
#define THREAD_REFERENCE pthread_t
#define MUTEX_REFERENCE pthread_mutex_t*
#define MUTEX_LOCK(m) pthread_mutex_lock(m)
#define MUTEX_UNLOCK pthread_mutex_unlock
#define EVENT_REFERENCE struct _waitt
#define EVENT_WAIT(m) do { char lc; if (read(m.fds[0], &lc, 1)) {} } while (0)
#define EVENT_RAISE(m) do { char lc = 'j'; if (write(m.fds[1], &lc, 1)) {} } while (0)
#define EVENT_NULL(m) do { m.fds[0] = -1; m.fds[1] = -1; } while (0)
Here is the code where I launch the thread.
void Main::CollectMachineFactors()
{
#ifdef WIN32
machineFactorsThread = CreateThread(NULL, 0, CollectMachineFactorsThreadProc, this, 0, 0);
if ( machineFactorsThread == NULL )
{
ERRORLOG( "Could not create thread for machine id: " << ERROR_NO << endl )
}
#else
int retval = pthread_create(&machineFactorsThread, NULL, CollectMachineFactorsThreadProc, this);
if (retval)
{
ERRORLOG( "Return code from machine id pthread_create() is " << retval << endl )
}
#endif
}
Here's the simple failure case of running this multithreaded. It always fails for this code with the stack trace above:
CollectMachineFactors();
EVENT_WAIT(machineFactorsEvent);
cout << machineFactors;
return 0;
At first I suspected a library problem. Here's my makefile:
# Main executable file
PROGRAM = sysinfo
# Object files
OBJECTS = Version.h Main.o Protocol.o Socket.o SSLConnection.o Stats.o TimeElapsed.o Formatter.o OSX.o Threads.o
# Include directories
INCLUDE = -Itaocrypt/include -IyaSSL/taocrypt/mySTL -IyaSSL/include -isysroot /Developer/SDKs/MacOSX10.5.sdk -mmacosx-version-min=10.5
# Library settings
STATICLIBS = libtaocrypt.a libyassl.a -Wl,-rpath,. -ldl -lpthread -lz -lexpat
# Compile settings
RELCXX = g++ -g -ggdb -DDEBUG -Wall $(INCLUDE)
.SUFFIXES: .o .cpp
.cpp.o :
$(RELCXX) -c -Wall $(INCLUDE) -o $@ $<
all: $(PROGRAM)
$(PROGRAM): $(OBJECTS)
$(RELCXX) -o $(PROGRAM) $(OBJECTS) $(STATICLIBS)
clean:
rm -f *.o $(PROGRAM)
I can't for the life of me see anything particularly odd or dangerous and I'm not sure where to look. The same threaded process works fine on any Linux machine I have tried. Any suggestions? Any tools I should try?
I can add more info if it would be helpful.