views:

33

answers:

1

We have a console application (currently .NET) that sends out mail in bulk to people that have subscribed to a mailing list. We are re-implementing this application because of limitations that it currently has.

One of the features it must have, is that it can resume after an unexpected interrupt of its operation. This means that every time it succesfully sends an e-mail, it has to keep track in a way that it can pick back up right where it left off. It'll get the information it needs (basically the list of recipients which are identified using a numeric id) from a different server, which has the database containing this information.

Our setup is simple: we have one Windows-based web/database server that contains the recipients, and we have the SMTP-server running Debian.

We have come up with several options that would solve this:

  1. Send a signal back to the database after every send operation
  2. Keep track in a small file by writing only the last id of the recipient to this file (overwriting its contents with each write) after every send operation.
  3. Keep track in a database that runs on the host machine (mysql, postgresql, sqlite, etc)

The constraints are that the application is supposed to send mails fast. As for amounts of mails it has to send, it'll vary between several hundreds to several tens of thousands per batch, and it could be several batches per day, too. Overall it's usally between 1000 and 50.000 mails on a day, but this will grow. Also, it must be able to resume accurately so I can't wait until, say, 50 mails are sent, and then write the progress to a file or database or so.

This what I came up with so far with regards to the above solutions:

  1. We currently have our application use this solution. But the application will run on a different server than the database server (they aren't in the same network either, but the application will run on the mail server, as opposed to the current situation) so I can't imagine that being the most efficient solution.
  2. This could be very fast, but wouldn't it strain the hard drive to the point where its lifespan could be severely shortened? (This server is an older Opteron, I believe, it may pre-date SATA, but if so, not by much.)
  3. This may be very fast, and efficient, but would it be necessary to setup a database for the purpose of only storing 2 numbers (id of the batch, and id of the last recipient within that batch)? Would overhead maybe slow this down?

Apart from the above solutions, are there other options I haven't yet considered, to keep track without really slowing the application down? Are my assumptions accurate?

+1  A: 

Hi Rob,

1000-50000 emails per day doesn't seem like an awful lot to me, so I don't think you will have to worry too much about capacity at the moment. Where I work we have a single instance of a Windows service which reads 100 rows from a database (where our email data is stored) at a time, processes each row in succession and updates the database to mark the email as sent. I'm not saying this is a good design (it isn't) but we regularly send more than 50k emails per day using this setup.

If you have a real need to scale - i.e. one you can quantify in terms of growth over the next 3, 6, 12 months and which shows significant growth - then I'd put real effort into scalability now. If you don't, I'd focus on keeping it simple and lightweight.

Why not mark each email message as "in process" while it's being processed by your bulk email application, and then mark it as "sent" (both in the db) when the work is done? This approach could allow you to multi-thread your application as scale demands as well (if you design for that, of course).

Øyvind
@Øyvind thanks for your answer! We're not worried about the capacity per se, but we do need this application to send the mails to the SMTP server as fast as possible, coupled with a high reliability in terms of resuming interrupted batches. If we were to send a database query to the database, which is a different machine, we'd almost have the same situation as now. It's currently too slow, at some 60 - 80 mails per minute.
Rob
Hi Rob. Happy to help if I can. Have you unequivocally identified the db connection as being the culprit? Having the db on a separate machine is pretty normal, so what's causing this to slow down your app? Is it your network? How much data are you loading in a single attempt?
Øyvind
Øyvind Right now, the database is not why the application is too slow. The problem is that it has to connect to another machine (SMTP) for each mail. I fear the situation will be the reverse after the application is moved to the SMTP server, where it'd have to connect to the database server for each mail. It wouldn't be a problem if these machines where in the same network, but they're not. They have to connect with one another through the internet.
Rob
What about installing an SMTP server on the same machine as where your app runs, or in the network? If latency is the problem (when connecting to your SMTP server over the internet) then that may solve it?
Øyvind