What is the right solution for a high-availability authorization service?

I work for a software shop, which has an in house predictive dialer product, and we need to implement a solution to obey to the DO-NOT-CALL lists.

Basically, I have a database with the customers/prospective customers that I need to call, and another database with the phone numbers I can't call. As the system is a predictive dialer, based on the performance of the operation, time averages and stuff, it'll dial more or less calls per logged system user. Usually this 'magic' number is around 3 - 4 calls per logged agent.

The phone number repository for the predictive dialer is a PostgreSQL database. The predictive dialer pick a bunch of numbers up from the database and send a command to the pbx to dial the bunch, and then the business logic goes on to transfer the valid calls to the call center clerks, and etc (this is irrelevant as my problem is before the call).

I need to implement the do-not-call list functionality. This do-not-call list will be provided to our company by a government agency, in a CSV file, on a daily basis. Every time I receive a new CSV file, I have to purge the old do-not-call-list, and put the new one in place.

My first thought to implement it was do a batch processing, cross-referencing the DO NOT CALL LIST with my current customer database. But I think that, depending on the size of both databases, the cross-referencing would be very performance intensive, and sometimes could not be finished overnight. I've had this kind of problems with batch processing before, and it's not a nice thing to see.

My second idea came up when I thought about how large institutions handle high-performance and high-throughput authorization systems, such as credit card or user authentication/authorization. I thought that creating an authentication service for the DO NOT CALL LIST numbers, and changing the algorithm of my predictive dialer to check each number against this authorization service before dialing would be neat.

As I'm only confabulating here, I have no idea which idea is the best, or if I got it totally wrong and should look to another direction. So, my question is: what would be your recommendation? Store the DO NOT CALL CSV file in memory? use LDAP? use MySQL? PostgreSQL? Do the batch processing thing? Or am I definitely screwed?

I know I'm not the first person in the world to have this kind of problem, so please enlighten me.

Your challenge, of finding a number of entries from within a vast space of possible entries, reminds me of DNS black/block lists.

rbldnsd is a small and fast DNS daemon which is especially made to serve DNSBL zones. This daemon was inspired by Dan J. Bernstein's rbldns program found in the djbdns package. More rbldnsd, from Google

It has support for name-based zones, so you could convert the list of numbers to ENUM-style URIs - eg +1-555-4242 becomes 2.4.2.4.5.5.5.1.e164.arpa. This is then entered into the rbldnsd datafile, compiled into memory and accessed like any other blocklist. A default entry means can-call, or if the entry exists, it would be given a DoNotCall entry.

You've still got the batch conversion problem though, although it would be a somewhat simpler script, quite possible to do with Perl or AWK. You might also be able to split the incoming CSV files to multiple files for parallel processing, and a final merge.

ansaurus

tags:

views:

answers:

What is the right solution for a high-availability authorization service?

related questions