views:

297

answers:

7

Hello all,

I have a web application that currently sends emails. At the time my web applications sends emails (sending of emails is based on user actions - not automatic), it has to run other processes like zipping up files.

I am trying to make my application "future proof" - so when there are a large number of users I don't want the server strained, so i thought that putting the emails that need to be sent and the files that need to be zipped in a queue. Put them in table and then use a cron job to check every second and execute them (x rows at a time).

Is the above a good idea? Or is there a better approach? I really need help to get this done properly to save myself headaches later on :)

Thanks all

+5  A: 

It's a good approach, but the most important thing you can do right now is have a clear interface for queuing up the messages, and one for consuming the queue. Don't make the usage on either end hard-coded to a DB.

Later on, if this becomes a bottleneck, you may want the mail sending to be done from a different machine which may not even have access to the DB, so this tiny investment up front will give you options later.

orip
I see, very important to keep a loose interface, what things other than a DB could I use when you talk about having a different server delivering my mail?
Abs
You could use a message queuing system, you could append to a _different_ DB that's optimized for that sort of insert/delete behavior without burdening your other DB, you could buffer it in memory and flush to files that are then consumed - really, you're just keeping your options open.
orip
Ah ok, thank you for the clarification. Really thank you :)
Abs
+2  A: 

One aspect you might have ignored is the zipping speed you are using, it might be in your best interest to use a lighter compression level in your zip process as that can make large improvements in zip time (easily double) which can add up to a lot when you get into the realm of multiple users.

Even better if you make the zipping intelligent and use no compression when you're zipping large already compressed files (MP3, ZIP, DOCX, XLSX, JPG, GIF, etc) and using high compression when you have simple text files (TXT, XML, DOC, XLS, etc) as they will zip very quickly even with heavy compression.

TravisO
Ah, thank you, great insight. I will look into this!
Abs
+1  A: 

An important point might be that rather than having a cron job run once every second, have a always-running daemon that's automatically restarted on exit - or something like that.

One reason is, just as you describe it yourself, if a lot of users request emails to be sent out and the queue builds up, one cronjob won't have time to finnish before the ext one stats, and you risk having your system flooded with processes.

eliego
Good point - thanks for that. Also would you recommend that I process everything on in the Queue or some of it? Remember all this time critical, the faster the better :)
Abs
Depending on your setup the most efficient might be to fetch a bunch of email jobs at the same time, then process them in parallell with multiple keep-alive connections to the smtp server running. But, it all depends on how you implement your queue, and how you actually send your e-mails
eliego
+1  A: 

Is the above a good idea? yes

could there be a better solution to handle millions of users down the road? possibly.. but thats not what is important. what is important is that you have build in the layer of abstraction. If some day down the road you have crazy traffic and your cron queue isnt keeping up you can replace the functionality of that layer without changing any of the code which uses it.

mike
+1  A: 

Hmm. I don't really like the idea of cron running something every second. That seems like way too often. If your application really needs to be that responsive, then I think you should keep it synchronous. That is, keep the processing in your web application and look for other ways to keep the server strain level down.

If you can wait longer between checking, then it would be better to have your cron job check the queue for 1 item at a time. If there is one, process it, and then check again for the next one without exiting. If there isn't one, exit and don't try again for five minutes or so.

However, all that being said, any decent Mail Transfer Agent (sendmail, postfix, Exchange) will have queuing built-in. It will probably do a better job than you could of making sure delivery occurs when the unexpected happens. There's a lot to think about in processing queued e-mail. I generally prefer to hand-off outbound e-mail to an MTA as early in the process as I can.

--
bmb

bmb
+1  A: 

Build in something that does distributed queuing. When you scale the volume you can scale the different layers of your tier where a bottleneck may come up.

Is there a reason to run the cron every second? Is the volume that high? I would say do your best to keep it an n-tier implementation because you will swap things in and out and refactor bits as they fight for your attention.

Try not to build anything you design for a few weeks. Often other scenarios will come to you then before things get locked in.

Jas Panesar
Thank you for the excellent advice, especially the last line. :)
Abs
Glad it was of use. The last line, incidentally is the hardest to do. I can barely wait a week or two. When I am able to I am handsomely rewarded.
Jas Panesar
+1  A: 

Good approach. Some refinements:

  1. Don't use a cron job, instead query on a timer.
  2. Include a state flag to keep multiple readers/writers sorted.
  3. Reader should drain the queue - don't block until queue read is empty.
  4. Keep it simple. Put complexity and subtlety into the writer/reader conversation.

In my experience this will scale nicely.

le dorfier