views:

385

answers:

4

We have a large project coming up soon with quite a lot of media processing (Images, Video) as well email output etc, the sort of stuff normally we'd put into a table called "email_queue" and we use a cron to run a script process the queue in the table.

I have been reading a lot on Message Queue systems like beanstalkd, and have even set it up. It was easy and nice to use, the problem is that I am unsure whether I am missing something.

Could someone detail the benefits of using a queue system rather than a table and a CRON? Since I really can't see to see what they are.

Thanks

+1  A: 

A message queue (a distributed one at least, e.g. RabbitMQ) gives you the ability to distribute work across physical nodes. You still need to have a process on each node to dequeue work and process it.

It gets down ultimately to your requirements I guess. You can achieve a more manageable solution at scale with using message queues: you can decouple your nodes more easily.

Of course, there is a learning curve... so it again comes back to your target goals.


Note that on each node you can still reuse your cron/db table until (and if) you wish to change the implementation. That's what great about decoupling when you can.

jldupont
Hi, I sort of understand that, but couldn't you do the same thing with a table/cron, connect to db remotely and run the cron on another machine?
Bowen
You could of course but then this appears to be a more "coupled solution". WIth a message queue approach, you are more decoupled for the node implementations. This can be a Good Thing.
jldupont
Coupled beacuse it uses database x with format y, or becase it uses queue a with format b? Nice question @bowen.
graffic
+2  A: 

Differences:

  1. Once a message is put on the queue it can be immediately delivered. So if your cron normally ran every 5 minutes, you could process faster with the queuing.

  2. If your queueing system supports transactions, then it will automatically re-deliver a message if the processing fails.

  3. It can be harder to query what is in your queue. A database table has a nice way to search (sql).

  4. If you have multiple servers/processes/threads handling messages, the queue system will make sure a message is only delivered to one of them. With a DB table you need to handle this via application code (locking, flags, etc ...)

Dave
A: 

First, queues are often backed by actual DB tables and can maintain message durability. That aside, the queue is a natural way to shove off work that needs to be done asynchronously, which if you design on that principal from the start is very powerful.

Other than the fact that a table (entity) has a set of hard columns (attributes), both this table being composed of a set of records composing as well as a queue are nothing more than lists of stuff You are employing the queue-as-a-table as a formal queue, just that you are polling it on a regular (cron) basis.

MQs add another nifty feature though of generally synchronizing access to the message itself (you may or may not be doing this in your SQL to get the next thing).

I like to consider the cron/table mechanism as POLL-based and the MQ as EVENT-based.

Benefit of a queue in my opinion is that it takes care of the sync'ing, status updating. MQs can be set up to "broadcast" (topic) or make available the message to a group of consumers or listeners.

MQs though asynchronous would likely operate between your cron window. How do you know that the number of messages you process in your table can be accomplished before the next cron job runs and tries to step on the previous job?

Multiple consumers for the MQ allows you to scale the work as you see fit. In the example above if you saw that your load average (just the same in the OS' process queue) is greater than you like, you can provision another consumer to handle said load, bringing it on and offline as metrics demand.

MQs can be set up to have different operational parameters such as message priority and performance (some queues can remain in memory, others persist to disk).

Downside is that (as already mentioned) that the queue can sometimes be hard to query and for which to obtain metrics. I always find MQ systems that have a DB backing store so that I can myself watch the queue with SQL.

Xepoch
A: 

This gets asked fairly frequently, and there's usually not a compelling reason to go MQ if you're comfortable with databases. Here's one example thread.

My take is that you might want to avoid the learning curve unless your data requirements include exceptionally high volumes, which is unlikely if you're thing cron rather than a process with a timer (much less multiple processes with timers.)

le dorfier