I have a performance issue where clients are creating hundreds of a particular kind of object "Foo" in my unmanaged (not .NET) C++ application's DOM. Each Foo instance has its own asynchronous work queue with its own thread. Obviously, that doesn't scale.
I need to share threads amongst work queues, and I don't want to re-invent the wheel. I need to support XP, so I can't use the Vista/Win7 thread pool. The work that needs to be done to process each queue item involves making COM calls in the multi-threaded COM apartment. The documentation for the XP thread pool says that it is okay to call CoInitializeEx() with the MTA apartment in the thread worker function callback. I've written a test app and verified that this works. I made the app run 1 million iterations with and without a CoInitializeEx/CoUninitialize pair in the WorkItem callback function. It takes 35 seconds with the CoInit* calls and 5 seconds without them. That's way too much overhead for my application. Since the thread pool is per-process and 3rd-party code runs in my process, I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Given all of that, is there any way that I can use the Win32 thread pool? Am I missing something, or is the XP thread pool pretty useless for high-performance COM applications? Am I just going to have to create my own thread-sharing system?