Ok, I'll try and keep this short, sweet and to-the-point.
We do massive GeoIP updates to our system by uploading a MASSIVE CSV file to our PHP-based CMS. This thing usually has more than 100k records of IP address information. Now, doing a simple import of this data isn't an issue at all, but we have to run checks against our current regional IP address mappings.
This means that we must validate the data, compare and split overlapping IP address, etc.. And these checks must be made for each and every record.
Not only that, but I've just created a field mapping solution that would allow other vendors to implement their GeoIP updates in different formats. This is done by applying rules to IPs records within the CSV update.
For instance a rule might look like:
if 'countryName' == 'Australia' then send to the 'Australian IP Pool'
There might be multiple rules that have to be run and each IP record must apply them all. For instance, 100k records to check against 10 rules would be 1 million iterations; not fun.
We're finding 2 rules for 100k records takes up to 10 minutes to process. I'm fully aware of the bottleneck here which is the shear amount of iterations that must occur for a successful import; just not fully aware of any other options we may have to speed things up a bit.
Someone recommended splitting the file into chunks, server-side. I don't think this is a viable solution as it adds yet another layer of complexity to an already complex system. The file would have to be opened, parsed and split. Then the script would have to iterate over the chunks as well.
So, question is, considering what I just wrote, what would the BEST method be to speed this process up a bit? Upgrading the server's hardware JUST for this tool isn't an option unfortunately, but they're pretty high-end boxes to begin with.
Not as short as I thought, but yeah. Halps? :(