views:

141

answers:

2

I have a performance issue where clients are creating hundreds of a particular kind of object "Foo" in my unmanaged (not .NET) C++ application's DOM. Each Foo instance has its own asynchronous work queue with its own thread. Obviously, that doesn't scale.

I need to share threads amongst work queues, and I don't want to re-invent the wheel. I need to support XP, so I can't use the Vista/Win7 thread pool. The work that needs to be done to process each queue item involves making COM calls in the multi-threaded COM apartment. The documentation for the XP thread pool says that it is okay to call CoInitializeEx() with the MTA apartment in the thread worker function callback. I've written a test app and verified that this works. I made the app run 1 million iterations with and without a CoInitializeEx/CoUninitialize pair in the WorkItem callback function. It takes 35 seconds with the CoInit* calls and 5 seconds without them. That's way too much overhead for my application. Since the thread pool is per-process and 3rd-party code runs in my process, I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().

Given all of that, is there any way that I can use the Win32 thread pool? Am I missing something, or is the XP thread pool pretty useless for high-performance COM applications? Am I just going to have to create my own thread-sharing system?

+1  A: 

I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().

Windows will clean up if a thread exits without calling CoUninitialize, we know this works because if it didn't there would be no cleanup when threads crash or are aborted.

So the only way this hack could cause a problem is of someone was trying to queue work items that needed an STA apartment, which seem unlikely.

I'd be tempted to go for it.

John Knoeller
+1  A: 

Have you verified what is taking so long? i.e. is it the call to CoInitializeEx()? You definitely don't need to call CoInitialize once per task. You also don't say how many threads you spawn, i.e. if your running on a dual core and your work is CPU intensive don't expect more than a 2x speedup, and if your work isn't CPU intensive then it's waiting on some resource (memory, disk, net) and speedups will be similarly constrained, perhaps made worse if there is a lock being held for that resource.

If you can use Visual Studio 2010 take a look at the Parallel Pattern Library and Asynchronous Agents Library, there are a couple tools that can help make this take less code to write.

If you can't you can at least try placing a token in TLS that represents whether COM has been initialized on that thread and use the presence of this token to bypass your calls to CoInitialize when they aren't needed.

Rick
The 30 seconds is definitely the CoInitializeEx/CoUninitialize. I am operating under the assumption that I shouldn't change the state of a worker thread managed by the win32 thread pool, so it seems to me that it is essential to call CoUninitialize before completing the task callback. The work is mostly disk I/O, but there's high CPU utilization due to other parallel tasks, so there isn't any CPU to spare. Thanks for the tip on VS 2010, I'll look forward to taking a look at that when I upgrade.
David Gladfelter