views:

157

answers:

2

Is there any inherent advantage when using multiple workers to process pieces of procedural code versus processing the entire load?

In other words, if my workflow looks like this:

  1. Get work from queue0 and do A
  2. Store result from A in queue1
  3. Get result from queue 1 and do B
  4. Store result from B in queue2
  5. Get result from queue2 and do C

Is there an inherent advantage to using 3 workers who each do the entire process themselves versus 3 workers that each do a part of the work (Worker 1 does 1 & 2, worker 2 does 3 & 4, worker 3 does 5).

If we only care about working being done (finished with step 5) it would seem that it scales the same way (once you're using at least 3 workers). Maybe the big job is better because workers with that setup have less bottleneck issues?

+1  A: 

In general, the smaller the jobs are, the less work you lose when some process crashes. Also, the smaller the jobs are, the more evenly you'll be able to distribute the work. (Instead of at one point having a single worker instance doing a long job and all the others idle, you'd have all the worker instances doing small pieces of work.)

Setting aside how to break up the work into smaller pieces, there's a question of whether there should be multiple worker roles, each of which can only do one kind of work, or a single worker role (but many instances) that can do everything. I would default to the latter (code that can do everything and just checks all the queues to see what needs to be done), but there are reasons to go with the former. If you need more RAM for one kind of work, for example, you might use a bigger VM size for that worker. Another example is if you wanted to scale the different kinds of work independently.

smarx
A: 

Adding to what @smarx says:

  • The model of a "multipurpose" worker is of course more general. So even if you require specialized types (like the extra RAM example used above) you would simply have a single task in that particular role.

  • There's the extra perspective of cost. You will have an economic incentive to increase the "task density" (as in tasks/instance). If you have M types of work and you assign each one to a different worker, then you will pay for M instances, even if some those might only do some work every once in a while.

I blogged about this some time ago and it is one topic of our guide (chapter "06 week3.docx")

Many frameworks and samples (including ours) use this approach.

Eugenio Pace