views:

56

answers:

3

Specialist or generalist threads.

Hi, I'm working on a system where objects go through some steps.

1st. Mostly database queries

2nd. Mostly hd I/O and xml parsing

3rd. Mostly Webservice communication

4th. Mostly Xml serialization and deserialization

5th. Some optional work

The system need to work with some thousands of objects per hour, so I'll be using a lot of threading, but my question is, whats the best approach?

  • Some specialist threads for each step: like 5 threads on each step, some threads get objects on 1st step, work on them, update the status on those objects, so another specialist thread on 2nd step get those objects and work on this.

  • All generalist threads, each thread get some object from step one and goes until the end of step 5.

+1  A: 

Coincidence, we had some similar discussion a while ago. We came up these points that should be taken in concern before decision:

  1. How much time by average a step takes? -> If each step takes minimal amount of time then, in general it's better to have one thread doing all steps or context switch becomes an overhead
  2. Is each step highly domain specific? -> If so then it's better to keep them in separate threads. Though people may argue that just separating the execution code is enough, but I thing it's not always so. For e.g. a particular thread might need some special prioveleges or higher priority.
  3. Cost of context switches? -> No need to explain
  4. Threading model and resources -> For e.g. your system ran out of threads and a higher priority request has come. Will you leave low preiority taks to serve this request?

There are some more points which I will add in comments as I remeber!!

Suraj Chandran
1. Only the webservice step takes longer than 3 seconds.2. No, all steps belongs to the same domain
Rafael Mueller
+1  A: 

I can imagine that you may wish to throttle the number of simultaneous DB and WS calls, so you may be able to benefit from having differing amounts of concurrancy at different stages down your pipeline. Hence I might well consider the use of specialists. That would tend to increase the overall complexity of the solution. So I would would first start by building and performance testing the Generalist approach. If you get your desired throughput then keep it simple, leave well alone.

djna
+2  A: 

Some things to consider

  • Failure mode: What happens if a step fails? The performed work has to be retried or discarded? Will there be failure modes where threads die and get recreated or threads will live on forever? If the work has to be retried, then specialist threads make more sense, as when there is a failure the objects get readded to the queue.

  • Coordination: In the specialist threads scenario object will usually live longer in the shared structure if steps are strictly serial, possibly hurting your throughput. So, if all steps are strictly serial, it's easier to have generalist threads to reduce coordination effort.

Vinko Vrsalovic
If a step fail, we can ignore the object and goes to the next one or we can redo the step, depends what caused the fail.I was thinking in threads living forever.
Rafael Mueller