views:

358

answers:

4

I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.

Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.

We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.

But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.

Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?

+1  A: 

I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.

One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.

So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.

In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.

I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.

http://github.com/defunkt/resque

Andrew Kuklewicz
+2  A: 

I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.

Here's how you could do it, using AMQP:

  • Have Rails process send a message saying "get info for user i".
  • The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
  • The worker then spins off the message into the thousands of small requests to SalesForce.
  • When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
  • Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.

If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.

Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.

stephenjudkins
+1  A: 

also checkout "cramp" and "beanstalk"

rogerdpack
A: 

Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.

Clint Miller