views:

365

answers:

3

Not that I'm not appreciative of the powers of multithreading or ThreadPool, but I'm scared I broke something since I'm getting a roughly 20x speed increase (2-3s down from over a minute) with a relatively naive usage of ThreadPool. So I submit my code here to be torn apart by people far wiser than I.

Am I doing something wrong here, or is this just a far better candidate for multithreading than I ever hoped? (Yes, this function is an entire thread: like I said, this used to take over a minute to run)

EDIT: To answer my own question, no, this is broken: It seems to be running multiple times, but over the same trigger. Is this because of the way lambda are handled?

private static void CompileEverything()
{
    try
    {
        // maintain the state of our systray icon
        object iconLock = new object();
        bool iconIsOut = true;

        // keep a count of how many threads are still running
        object runCountLock = new object();
        int threadRunning = 0;

        foreach (World w in Worlds)
        {
            foreach (Trigger t in w.Triggers)
            {
                lock (runCountLock)
                {
                    threadRunning++;
                }

                ThreadPool.QueueUserWorkItem(o =>
                {
                    // [snip]: Do some work involving compiling code already in memory with CSharpCodeProvider

                    // provide some pretty feedback
                    lock (iconLock)
                    {
                        if (iconIsOut)
                            notifyIcon.Icon = Properties.Resources.Icon16in;
                        else
                            notifyIcon.Icon = Properties.Resources.Icon16out;

                        iconIsOut = !iconIsOut;
                    }

                    lock (runCountLock)
                    {
                        threadRunning--;
                    }
                });
            }
        }

        // wait for all the threads to finish up
        while (true)
        {
            lock (runCountLock)
            {
                if (threadRunning == 0)
                    break;
            }
        }

        // set the notification icon to our default icon.
        notifyIcon.Icon = Properties.Resources.Icon16;
    }
    // we're going down before we finished starting...
    // oh well, be nice about it.
    catch (ThreadAbortException) { }
}
A: 

Well the ThreadPool automatically limits the number of running threads to the number of processors which is maximally efficient. Each context switch is up to 1Mb (default) in 4Kb page increments of memory swapping, so if you're using a lot more threads than cores, you could be getting a ton of speed just from no context switches.

Adam A
Please see my edit.. Seems I was right, I did break something afterall...
Matthew Scharley
A context switch doesn't cause 1 MB of "memory swapping". By default, a Win32 stack is 1 MB of virtual memory, but most threads will never use that much. Also, when a context switch does happen, the only thing that changes related to the stack is the `ESP` register now points to the stack of the new thread. No memory needs to be actually moved anywhere.
Greg Hewgill
Unless of course the context switch involves paging (which is a real issue on my current dev machine, with only 256MB memory), but thats another issue altogether...
Matthew Scharley
It would only page out the actual pages used, which means in 4k increments, not 1M, so it's not as bad as it sounds. Having said that, the overhead of context switches is never going to be trivial, so it's best to have no more threads than actually benefits you. This number is going to depend on a lot of factors, and is probably best determined by trial and error.
Steven Sudit
Thanks for the clarification gents. I guess I only know enough to hang myself =)
Adam A
You fixed it, so I negated your negative.
Steven Sudit
+1  A: 

I think you can do better. There is no need to lock around the changes to threadRunning. You can just use Interlocked.Increment() and Interlocked.Decrement():

        Interlocked.Increment(ref threadRunning);
        ThreadPool.QueueUserWorkItem(o =>
        {
            // [snip]: Do some work involving compiling code already in memory with CSharpCodeProvider

            // provide some pretty feedback
            lock (iconLock)
            {
                notifyIcon.Icon = (iconIsOut ? Properties.Resources.Icon16in : Properties.Resources.Icon16out);
                iconIsOut = !iconIsOut;
            }

            Interlocked.Decrement(ref threadRunning);
        });
hughdbrown
+4  A: 

Interlocked.Increment is better than locking, but the polling loop at the end scares me. First, if you're going to loop, then do a Thread.Sleep(0) to release the processor each time. Second, if you're going to poll for a variable, then you need to make sure it's either marked volatile or you use MemoryBarrier, else the compiler may assume no outside thread will change it and therefore optimize away the check, leading to an infinite loop.

Even better would be for each thread to check for it hitting zero and set an event if it does. You can then wait on the event instead of polling. The only trick is that you want to increment once in the main thread before the dispatch loop, then decrement and check for zero before waiting on the event.

edit

If it's broken because it's reusing the trigger, then the closure is wrong. Trying copying the value of world into a variable local to the inside of the loop and using that variable for the lambda expression.

Steven Sudit
Agree. I fixed it by passing `t` into the ThreadPool as it's state variable and away we go.
Matthew Scharley
Good. How's the performance now?
Steven Sudit
Also, volatile can't be set on function local variables. `Interlocked.Equals`?
Matthew Scharley
Still much better than without the ThreadPool, but I've yet to do any real measures
Matthew Scharley
I agree with Steve: the polling loop is an error and you do want an EventWaitHandle for the threads to signal.
hughdbrown
Runtime is now about 10-15 seconds. So even so, a 4-6x speedup on a dualcore system is a significant improvement.
Matthew Scharley
Yes, that sounds like a realistic speedup. You can promote the thread count variable to a volatile static field, and then you can check the value by looking at the return from Interlocked.Decrement(). But keep in mind what I said about pre-incrementing.
Steven Sudit
Another thing is that I don't think that catching ThreadException will work the way you have it. You would need the try/catch to be inside the delegate itself, not surrounding the entire method. I'm also not sure that you WANT to catch it, since it's not clear what you can do about it or whether anyone will ever try to cancel a thread from the pool.
Steven Sudit
The try/catch is there as a remnant from when there was no threadpool (ie. it was just one seperate thread that did all the processing). The main thread calls Abort on the processing thread to kill it if it is still going when the user attempts to close the program (so the thread and hence the program doesn't keep going). The catch is just there so that the exception doesn't bubble up to the level where it's reported to the user.
Matthew Scharley
Ah, ok. You could also just set the thread to Background, if you're not using the pool.
Steven Sudit
"I've seen quite a few .NET programmers incorrectly use the background thread to mean any thread created using the Thread constructor. The terminology is therefore getting very confusing. The correct meaning of background thread in .NET framework is a thread that does not have impact on whether a process is terminated." -- http://www.developerfusion.com/article/4272/net-threading-part-ii/6/ Ahh, this is a good point... Thanks!
Matthew Scharley
Glad to help. If you like what the thread pool has done for you, you might want to look into more sophisticated multiprocessing architectures, such as producer/consumer queues. They give you more control, and handle certain complex data-manipulation problems a lot more cleanly and reliably.
Steven Sudit
I've already got a producer/consumer queue elsewhere in this codebase (namely the part that actually manipulates and uses the triggers that are loaded in this snippet). If you're at all curious, I had a question about that part of the code over here: http://stackoverflow.com/questions/1116249
Matthew Scharley
Oh, I see. Well, I have nothing to add to Richard's answer there, but I'm glad you're doing your best to take full advantage of parallelism. A lot of this will get easier with VS 2010, since it includes things like a lock-free queue.
Steven Sudit