Can a shared ready queue limit the scalability of a multiprocessor system?
That's an interesting question. I wonder if anyone has written anything about that.
Simply put, most definetly. Read on for some discussion.
Tuning a service is an art-form or requires benchmarking (and the space for the amount of concepts you need to benchmark is huge). I believe that it depends on factors such as the following (this is not exhaustive).
- how much time an item which is picked up from the ready qeueue takes to process, and
- how many worker threads are their?
- how many producers are their, and how often do they produce ?
- what type of wait concepts are you using ? spin-locks or kernel-waits (the latter being slower) ?
So, if items are produced often, and if the amount of threads is large, and the processing time is low: the data structure could be locked for large windows, thus causing thrashing.
Other factors may include the data structure used and how long the data structure is locked for -e.g., if you use a linked list to manage such a queue the add
and remove
oprations take constant time. A prio-queue (heaps) takes a few more operations on average when items are added
.
If your system is for business processing you could take this question out of the picture by just using:
- A process based architecure and just spawning multiple producer consumer processes and using the file system for communication,
- Using a non-preemtive collaborative threading programming language such as stackless python, Lua or Erlang.
also note: synchronization primitives cause inter-processor cache-cohesion floods which are not good and therefore should be used sparingly.
The discussion could go on to fill a Ph.D dissertation :D