I have a list of 9 million IPs and, with a set of hash tables, I can make a constant-time function that returns if a particular IP is in that list. Can I do it in PHP? If so, how?
views:
159answers:
5Is there a way to maintain a 200MB immutable data structure in memory and access it from a script?
I think throwing it in memcache would probably be your best/fastest method.
This to me sounds like an ideal application for a Bloom Filter. Have a look at the links provided which might help you get it done ASAP.
If reading the file into sqlite would be an option you could benefit from indexes thus speeding up lookups?
Otherwise memcached is an option but i don't know how checking for existence would go if you do it with pure php lookups (rather slow my guess)
The interesting thing about this question is the number of directions you can go.
I'm not sure if caching is your best option simply because of the large set of data and the relatively low number of queries on it. Here are a few ideas.
1) Build a ram disk. Link your mysql database table to use the ramdisk partition. I've never tried this, but it would be fun to try.
2) Linux generally has a very fast file system. Build a structured file system that breaks up the records into files, and just call file_get_contents() or file_exists(). Of course this solution would require you to build and maintain the file system, which would also be fun. rsync might be helpful to keep your live filesystem up to date.
Example:
/002/209/001/299.txt
<?
$file = $this->build_file_from_ip($_GET['ip']);
if(file_exists($file)) {
// Execute your code.
}
?>
Have you tried a NoSql solution like Redis? The entire data set is managed in memory.
Here are some benchmarks.