views:

725

answers:

3

If I have a pretty static set of data that I want to be able to access as quickly as possible. Should I cache the data into Memcached or should I store it in a HEAP table or something inside MySQL? Would one scale better than the other?

Is there some other option that's even faster?

A: 

If it's not a huge amount store it inside your own process. That's the fastest.

Robert Munteanu
+5  A: 

memcached will be faster for simple uses, hands down -- connection setup is so much cheaper on memcached, since there's no auth, buffer allocation, etc. Also, memcached is designed to easily distribute keys between multiple servers.

However, memcached is only a simple key/value store. If you need to do anything more complex to your data (even something like SELECT * WHERE x > 5), a HEAP table is much more powerful.

Robert Munteanu brings up a good point though. Your cache hierarchy should be:

  1. Globals (local to the request/process)
  2. APC (local to the server)
  3. memcache (global)

If you don't need to propagate global changes to this data, then storing it in APC makes sense. If you need to access it several times during script execution, you should also cache it in globals in your script.

Frank Farmer
So APC is faster than memcache? Even if php and memcache were on the same machine?
UltimateBrent
It should be -- it's basically just a shared memory segment, so any data access essentially only requires copying memory from one place to another. Meanwhile, memcache requests happen through sockets, there's query parsing and deserialization overhead, etc. Here's a benchmark supporting the globals -> APC -> memcached cache hierarchy: http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
Frank Farmer
Thanks, this will work really well!
UltimateBrent
+1  A: 

The fastest option would be in-memory caching on the local system. That won't scale well to many millions of relations, but will be very fast and work well for small data sets.

I haven't done performance testing between Memcached/MySQL HEAP, but I'd guess Memcached would be faster because it doesn't have the overhead of a full relational DB engine. Memcached would almost certainly scale better, because you could distribute it between servers and have a round-robin request dispatch between them.

If you need to perform any filtering on the data before retrieving it, you should use MySQL. The performance overhead of transmitting unwanted data will probably outweigh the benefits of faster lookups.

If I were you, I'd load the data set in question into MySQL and Memcached, then run performance tests to see which is better for your data set. If there's a core of data that's accessed particularly often, consider an additional machine-local cache.

John Millikin