views:

493

answers:

4

I'm writing an application that has a multiple producer, single consumer model (multiple threads send messages to a single file writer thread).

Each producer thread contains two queues, one to write into, and one for a consumer to read out of. Every loop of the consumer thread, it iterates through each producer and lock that producer's mutex, swaps the queues, unlocks, and writes out from the queue that the producer is no longer using.

In the consumer thread's loop, it sleeps for a designated amount of time after it processes all producer threads. One thing I immediately noticed was that the average time for a producer to write something into the queue and return increased dramatically (by 5x) when I moved from 1 producer thread to 2. As more threads are added, this average time decreases until it bottoms out - there isn't much difference between the time taken with 10 producers vs 15 producers. This is presumably because with more producers to process, there is less contention for the producer thread's mutex.

Unfortunately, having < 5 producers is a fairly common scenario for the application and I'd like to optimize the sleep time so that I get reasonable performance regardless of how many producers exist. I've noticed that by increasing the sleep time, I can get better performance for low producer counts, but worse performance for large producer counts.

Has anybody else encountered this, and if so what was your solution? I have tried scaling the sleep time with the number of threads, but it seems somewhat machine specific and pretty trial-and-error.

A: 

Try to find an implementation of a Blocking Queue in the language that you use for programming. No more than one queue will be enough for any number of producers and one consumer.

Boris Pavlović
I'd like to avoid having to force all producers to lock on a single queue, as my main goal is to minimize the time it takes a producer thread to send a message. Moving to a single blocking queue would force this behavior.
Paul D.
A: 

To me it sounds like you are accidentally introducing some buffering by having the consumer thread be busy somewhere else, either sleeping or doing actual work. (the queue acting as the buffer) Maybe doing some simple buffering on the producer side will reduce your contention.

It seems that your system is highly sensitive to lock-contention between the producer and consumer, but I'm baffled as to why such a simple swap operation would occupy enough cpu time to show up in your run stats.

Can you show some code?

edit: maybe you are taking your lock and swapping queues even when there is no work to do?

Trevor Harrison
Hmm, I'm not sure entirely what you mean. There *is* intentional buffering in the sense that the queues are swapped by the consumer to minimize the time it must hold a producer's mutex. This is work related so I don't really want to copy and paste code, but the swap operation just swaps two pointers exactly like you'd expect - temp = queue2, queue2 = queue1, queue1 = temp. Nothing fancy, and the mutex is only held for that operation before being released.
Paul D.
yeah, but if your problem is lock contention, you need buffering (local to your producer) to reduce the number of times you try to acquire the lock.
Trevor Harrison
also, you should consider @George Phillips answer. maybe you need to change your scheme.
Trevor Harrison
+1  A: 

You could pick the sleep time based on the number of producers or even make the sleep time adapt based on some dyanmic scheme. If the consumer wakes up and has no work, double the sleep time, otherwise halve it. But constrain the sleep time to some minimum and maximum.

Either way you're papering over a more fundamental issue. Sleeping and polling is easy to get right and sometimes is the only approach available, but it has many drawbacks and isn't the "right" way.

You can head in the right direction by adding a semaphore which is incremented whenever a producer adds an item to a queue and decremented when the consumer processes an item in a queue. The consumer will only wake up when there are items to process and will do so immediately.

Polling the queues may still be a problem, though. You could add a new queue that refers to any queue which has items on it. But it rather raises the question as to why you don't have a single queue that the consumer processes rather than a queue per producer. All else being equal that sounds like the best approach.

George Phillips
+1: fundamental issues should be talked about first
Trevor Harrison
The reason I didn't do a single queue was that my main priority was being able to have the producer write into the queue and be finished as quickly as possible (the code is for a logging library). Maybe I'm overthinking this and a single queue *would* be just as fast - but ideally the logger would support anywhere from a few to 40-50 threads. In the many queue case, there are only ever 2 threads contending for the same resource, instead of 40-50 threads contending for one queue lock.I do like the idea of a semaphore though, I agree that's probably the "correct" way of going about things.
Paul D.
+1  A: 

Instead of sleeping, I would recommend that your consumer block on a condition signaled by the producers. On a posix-compliant system, you could make it work with pthread_cond. Create an array of pthread_cond_t, one for each producer, then create an additional one that is shared between them. The producers first signal their individual condition variable, and then the shared one. The consumer waits on the shared condition and then iterates over the elements of the array, performing a pthread_cond_timed_wait() on each element of the array (use pthread_get_expiration_np() to get the absolute time for "now"). If the wait returns 0, then that producer has written data. The consumer must reinitialize the condition variables before waiting again.

By using blocking waits, you'll minimize the amount time the consumer is needlessly locking-out the producers. You could also make this work with semaphores, as stated in a previous answer. Semaphores have simplified semantics compared to conditions, in my opinion, but you'd have to be careful to decrement the shared semaphore once for each producer that was processed on each pass through the consumer loop. Condition variables have the advantage that you can basically use them like boolean semaphores if you reinitialize them after they are signaled.

Adam