views:

159

answers:

5

I have a list of 9 million IPs and, with a set of hash tables, I can make a constant-time function that returns if a particular IP is in that list. Can I do it in PHP? If so, how?

+2  A: 

I think throwing it in memcache would probably be your best/fastest method.

Zak
+3  A: 

This to me sounds like an ideal application for a Bloom Filter. Have a look at the links provided which might help you get it done ASAP.

  1. http://github.com/mj/php-bloomfilter
  2. http://code.google.com/p/php-bloom-filter/
fuentesjr
+2  A: 

If reading the file into sqlite would be an option you could benefit from indexes thus speeding up lookups?

Otherwise memcached is an option but i don't know how checking for existence would go if you do it with pure php lookups (rather slow my guess)

ChrisR
+2  A: 

The interesting thing about this question is the number of directions you can go.

I'm not sure if caching is your best option simply because of the large set of data and the relatively low number of queries on it. Here are a few ideas.

1) Build a ram disk. Link your mysql database table to use the ramdisk partition. I've never tried this, but it would be fun to try.

2) Linux generally has a very fast file system. Build a structured file system that breaks up the records into files, and just call file_get_contents() or file_exists(). Of course this solution would require you to build and maintain the file system, which would also be fun. rsync might be helpful to keep your live filesystem up to date.

Example:

/002/209/001/299.txt

<?
$file = $this->build_file_from_ip($_GET['ip']);
if(file_exists($file)) {
    // Execute your code.
}
?>
Dooltaz
You could also go the multi-server route. Based on the IP, use a different database connection (or API call). Splitting up the data to multiple servers would put less stress on each one, thus increasing speed of access.
Dooltaz
It also seems as if you can store the table itself into memory. http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html Take a look at the "Memory" or HEAP table type. You may be able to try this instead of using "MyISAM".
Dooltaz
The interesting thing is that both of your solutions can be linked :) My HD currently has 6 million inodes, so the only way of doing that would be creating another filesystem with more inodes, and I think that this can be done using a RAMDISK! I want, however, first to check out how much the metadata takes up, because 20M files with 100 bytes would already take up 2GB of RAM
konr
+1  A: 

Have you tried a NoSql solution like Redis? The entire data set is managed in memory.

Here are some benchmarks.