views:

81

answers:

4

I am redesigning the messaging system for my app to use intel threading building blocks and am stumped trying to decide between two possible approaches.

Basically, I have a sequence of message objects and for each message type, a sequence of handlers. For each message object, I apply each handler registered for that message objects type.


The sequential version would be something like this (pseudocode):

for each message in message_sequence                     <- SEQUENTIAL
    for each handler in (handler_table for message.type)
        apply handler to message                         <- SEQUENTIAL

The first approach which I am considering processes the message objects in turn (sequentially) and applies the handlers concurrently.

Pros:

  • predictable ordering of messages (ie, we are guaranteed a FIFO processing order)
  • (potentially) lower latency of processing each message

Cons:

  • more processing resources available than handlers for a single message type (bad parallelization)
  • bad use of processor cache since message objects need to be copied for each handler to use
  • large overhead for small handlers

The pseudocode of this approach would be as follows:

for each message in message_sequence                              <- SEQUENTIAL
    parallel_for each handler in (handler_table for message.type)
        apply handler to message                                  <- PARALLEL

The second approach is to process the messages in parallel and apply the handlers to each message sequentially.

Pros:

  • better use of processor cache (keeps the message object local to all handlers which will use it)
  • small handlers don't impose as much overhead (as long as there are other handlers also to be run)
  • more messages are expected than there are handlers, so the potential for parallelism is greater

Cons:

  • Unpredictable ordering - if message A is sent before message B, they may both be processed at the same time, or B may finish processing before all of A's handlers are finished (order is non-deterministic)

The pseudocode is as follows:

parallel_for each message in message_sequence                     <- PARALLEL
    for each handler in (handler_table for message.type)
        apply handler to message                                  <- SEQUENTIAL

The second approach has more advantages than the first, but non-deterministic ordering is a big disadvantage..

Which approach would you choose and why? Are there any other approaches I should consider (besides the obvious third approach: parallel messages and parallel handlers, which has the disadvantages of both and no real redeeming factors as far as I can tell)?

Thanks!

EDIT:

I think what I'll do is use #2 by default, but allow a "conversation tag" to be attached to each message. Any messages with the same tag are ordered and handled sequentially in relation to its conversation. Handlers are passed the conversation tag alongside the message, so they may continue the conversation if they require. Something like this:

Conversation c = new_conversation()
send_message(a, c)
...
send_message(b, c)
...
send_message(x)

handler foo (msg, conv)
    send_message(z, c)

...
register_handler(foo, a.type)

a is handled before b, which is handled before z. x can be handled in parallel to a, b and z. Once all messages in a conversation have been handled, the conversation is destroyed.

A: 

The first method also has unpredictable ordering. The processing of message 1 on thread 1 could take very long, making it possible that message 2, 3 and 4 have long been processed

This would tip the balance to method 2

Edit: I see what you mean.

However why in method 2 would you do the handlers sequentially. In method 1 the ordering doesn't matter and you're fine with that.

E.g. Method 3: both handle the messages and the handlers in parallel.

Of course, here also, the ordering is unguaranteed.

Given that there is some result of the handlers, you might just store the results in an ordered list, this way restoring ordering eventually.

Toad
No, because message 2 will not be processed until all handlers of message 1 (which happen to run in parallel) are done. ie, all threads are busy processing message 1 (or idle, if there are fewer handlers than threads). Or where is the hole in my logic?
Dan
Well, if ordering doesn't matter, then method 2 or even method 3 are the better ones. Method 3 isn't good because it has the disadvantage of method 2 (unordered messages) as well as the disadvantages of method 1 (bad use of caching, small handlers have large overhead - the only thing thats improved is possible parallelism). As for re-ordering results later, that only works if the message handlers are side-effect free.
Dan
I don't think order of handlers would ever matter, but order of messages does (or can - but its not in my control, since I'm using this for a plugin system).
Dan
+1  A: 

I Suppose it comes down to wether or not the order is important. If the order is unimportant you can go for method 2. If the order is important you go for method 1. Depending on what your application is supposed to do, you can still go for method 2, but use a sequence number so all the messages are processed in the correct order (unless of cause if it is the processing part you are trying to optimize).

Arkain
Ordering is definitely desireable, since there are plenty of situations I can think of where ordering would make life much much easier. I definitely want to optimize performance, but I can accept a performance hit in exchange for other benefits (ease of use, asynchronous message handling to prevent a message handler from hogging resources, safety, whatever other benefits may be possible). I will look at other options, such as letting the senders or handlers decide how messages are dealth with, or by coming up with a more elaborate (and intelligent) method of scheduling handlers..
Dan
+1  A: 

If possible I would go for number two with some tweaks. Do you really need every message tp be in order? I find that to be an unusual case. Some messages we just need to handle as soon as possible, and then some messages need be processed before another message but not before every message.

If there are some messages that have to be in order, then mark them someway. You can mark them with some conversation code that lets the processor know that it must be processed in order relative to the other messages in that conversation. Then you can process all conversation-less messages and one message from each conversation concurrently.

Give your design a good look and make sure that only messages that need to be in order are.

Chris
I was about to comment asking how to handle third-party modules imposing order on messages sent by other modules (which they may not have access to), but I figured it out: pass a conversation object along with the message to the handlers and allow them to add more messages to it. Messages in the same conversation are sequential, all others are processed in parallel like in #2 (or even the handlers are called within TBB tasks.. I guess task stealing means they run sequentially (making better use of cache) unless other threads are idle, in which case they run in parallel). Sound reasonable?
Dan
+1  A: 

I'd say do something even different. Don't send work to the threads. Have the threads pull work when they finish previous work.

Maintain a fixed amount of worker threads (the optimal amount equal to the number of CPU cores in the system) and have each of them pull sequentially the next task to do from the global queue after it finishes with the previous one. Obviously, you would need to keep track of dependencies between messages to defer handling of a message until its dependencies are fully handled.

This could be done with very small synchronization overhead - possibly only with atomic operations, no heavy primitives like mutexes or semaphores.

Also, if you pass a message to each handler by reference, instead of making a copy, having the same message handled simultaneously by different handlers on different CPU cores can actually improve cache performance, as higher levels of cache (usually from L2 upwards) are often shared between CPU cores - so when one handler reads a message into the cache, the other handler on the second core will have this message already in L2. So think carefully - do you really need to copy the messages?

slacker
My message objects are immutable and shared between handlers, which is why running handlers sequentially (approach 2) is better for caching, since all handlers using the same message will then be running in the same thread and therefore able to share the cache. My goal has always been to share immutable objects. You gave me an idea though with task dependency handling. Might make for a convenient approach.
Dan