ansaurus

Question

Answer 1

A:

ThreadPriority.Highest doesn't mean only the thread scheduler itself has a higher priority. The Win32 API has a more granular level of thread priority (clicky) with several levels above Highest (IIRC Highest is typically the highest priority non-admin code can be run at, admins can schedule higher priority as can any hardware drivers / kernel mode code) so there is no guarantee they will not be pre-empted.

Even if a thread is running with the highest priority windows can promote other threads above their base priority if they are holding resource locks that higher priority threads require which is another possibility why you may be suffering context switches.

Even then, as you say, Windows isn't a real time OS and it isn't guaranteed to honour thread priorities anyway.

PlausiblyDamp 2009-09-16 23:48:11

Answer 2

A:

To attack this problem a different way, do you need to have so many decoupled asynchronous operations? It may be useful to think about: vertically partitioning the work (asynchronously process numCores chunks of data end to end) instead of horizontally partitioning the work (as now, with each chunk of data processed in 15 decoupled stages); synchronously coupling some of your 15 stages to reduce the total to a smaller number.

The overhead of inter-thread communication will always be non-trivial. If some of your 15 operations are doing only a small chunk of work, the context switches will bite you.

Alex 2009-09-29 13:48:26

Answer 3

+1 A:

Windows is not a real-time OS. But you knew that already. What is killing you is the context switch times, not necessarily message times. You didn't really specify HOW your inter-processes communication works. If your really just running multiple threads, you'll find some gains by not using Windows message as a communication protocol, instead try rolling your own IPC using application hosted message queues instead.

The best average you can hope for is 1ms with any version of Windows when context switches occurs. Your probably seeing the 1ms times when your Application has to yield to the kernel. This is by design for Ring-1 applications (user-space). If it's absolutely critical that you get below 1ms you'll need to switch some of your application into Ring-0, which means writing a Device Driver.

Device Drivers don't suffer the same context-switch times that user apps do, and have access to nano-second resolution timers and sleep calls as well. If you do need to do this, the DDK (Device Driver Development Kit) is freely available from Microsoft, but I would HIGHLY recommend you invest in a 3rd party development kit. They usually have really good samples and lots of wizards to set things up right that would take you months of reading DDK documents to discover. You'll also want to get something like SoftIce because the normal Visual Studio debugger isn't going to help you debug Device Drivers.

keick 2009-09-29 14:30:50

Answer 4

+2 A:

Do the 15 asynchronous operations have to be asynchronous? i.e. are you forced to operate this way by a limitation of some library, or do you have the choice to make synchronous calls?

If you have the choice, you need to structure your application so that the use of asynchronicity is controlled by configuration parameters. The difference between asynchronous operations that return on a different thread vs. synchronous operations that return on the same thread should be transparent in the code. That way you can tune it without changing the code structure.

The phrase "embarrassingly parallel" describes an algorithm in which the majority of the work being done is obviously independent and so can be done in any order, making it easy to parallelise.

But you are "chaining together 15 async operations through ports and receivers". This could be described as "embarrassingly sequential". In other words, the same program could be logically written on a single thread. But then you'd lose any parallelism for the CPU-bound work occuring between the async operations (assuming there is any of significance).

If you write a simple test to cut out any significant CPU-bound work and just measure the context switching time, then guess what, you're going to be measuring the variation in the context switching time, as you've discovered.

The only reason for running multiple threads is because you have significant amounts of work for CPUs to do, and so you'd like to share it out between several CPUs. If the individual chunks of computation are short-lived enough, then context switching will be a significant overhead on any OS. By breaking your computation down into 15 stages, each very short, you are essentially asking the OS to do a lot of unnecessary context switching.

Daniel Earwicker 2009-10-03 09:04:28

ansaurus

tags:

views:

answers:

Interthread communication time

related questions