I have an application, in it's simplest form, it reads a large number of phone numbers from a database (about 15 million) and sends each number off one line at a time to a url for processing. I designed the application like this:
- bulk export the phone numbers from sql to a text file using SSIS. This is very quick and a matter of 1 or 2 minutes.
- load the numbers into a message queue (I use MSMQ at the moment).
- Dequeue the messages from a command line application and fire up the request over http to some service, like 3 calls per phone number and finally log to a database.
Problem is: It still takes a long time to complete. MSMQ also has a limit on the size of messages it can take and now I have to create multiple message queues. I need a lot of fault tolerance but I dare not make my message queue transactional because of performance. I'm thinking of publishing the message queue (currently a private queue) to the active directory so that the processes can dequeue it from different systems so this can complete quicker. Also, my processors hit 100% during execution and I'm changing it to use a threadpool at this time. I'm willing to explore JMS right now if it will handle the queue better. So far, the most efficient part of the whole processing is the SSIS part.
I'll like to hear better design approach, especially if you've handled this kind of volume before. I'm ready to switch to unix or do lisp if it handles this kinda situation better.
Thanks.