views:

243

answers:

1

I'm debugging this database project. It wraps access to SQLite for a higher level application. It's designed to run asynchronously, that is, it has methods like ExecuteRequestAsync() and IsRequestReady(). When ExecuteRequestAsync is called, it spawns a boost::thread to do the job and return the function immediately. When the higher level application decides that it no longer wants the result of a running request, it may call DumpRequest() to cancel it. Since it's difficult to gracefully cancel a database request, the implementation of DumpRequest just maintain a "cleanup monitor thread" that waits for "finished requests" and remove them. All boost::threads are managed through boost::shared_ptr, like:

boost::shared_ptr<boost::thread> my_thread = new boost::thread(boost::bind(&DBCon::RunRequest, &this_dbcon));

And when it's no longer needed (to be canceled):

vector<boost::shared_ptr<boost::thread> > threads_tobe_removed;
// some iteration
threads_tobe_removed[i].get()->join();
threads_tobe_removed.erase(threads_tobe_removed.begin()+i);

I created this unit test project to test the mechanism of executing and dumping the requests. It runs requests and randomly cancels running requests, and repeats for several thousand passes. The mechanism turned out to be okay. Everything worked as expected.

However, through observing the unit test project through sysinternal's Process Explorer, it's discovered that there's a handle leak problem. Every 500-ish passes, the handle count increases by 1, and never returns back. It's the "Event" type handle that is increasing. File and thread handles are not increasing (of course # of handles are increasing as threads are spawned, but there is a Sleep(10000) call every hundred passes to wait for them to be cleaned up so that the handle count can be observed).

I haven't been managing Event handles myself. They are created by boost::thread upon the creation of the thread. I only guarantee to gracefully close the threads, I have no idea what the Events are used for.

I'm wondering if anyone has experienced similar problems? What might be the cause of this leak? Is this number in Process Explorer reliable enough to call it a handle leak? Is there any way to trace and fix it?

I'm using statically linked boost 1.40 on Windows Vista, with Visual C++.

+1  A: 

Is the access to threads_tobe_removed thread-safe? If not, there may be a race condition, when one thread adds a thread to the vector via a call to DumpRequest, while the cleanup monitor thread deletes a thread from the vector. Thus, boost::thread-objects may be destroyed without joining the thread first, which would leave the thread running without an associated object, which might explain the leak.

Space_C0wb0y
Of course it's locked using a mutex. Synchronization isn't a problem here. Locking is done where necessary. Otherwise, wouldn't it cause thread handle leak instead of event handle?What's weird is that the leak only happens with like 0.2% of the request.
He Shiming
I think a leak of 0.2% is much more likely than any other number. Boost contributors are talented coders. The wouldn't miss a simple 100% leak. It's much more likely that the leak is from a subtle race condition that rarely happens. However, by default I assume boost is doing the right thing, if you really think you've found something strip it down to the smallest amount of code that will reproduce the problem and hit the boost mailing lists.
caspin