views:

478

answers:

2

I'm new to using distributed caching solutions like Memcached on a large web site, I have a couple questions and could someone who has experience on these comment please.

  1. Obviously the amount of data I can put into cache depends on server RAM. Supposed I have big enough server farm and RAM, is there a max number of objects I can put into memcached before I start seeing performance degrades? The reason I ask is that I figure if I put literally millions of object into memcached wouldn't it take longer for it to index and look up objects? Is there a line to draw here.

  2. Should I cache smaller but more objects in memcached, or bigger but less number of objects? Smaller objects do involve more round trips to DB to get them, but it is more flexible and easier to program.

Thank you very much,

Ray.

+2  A: 

Supposed I have big enough server farm and RAM, is there a max number of objects I can put into memcached before I start seeing performance degrades?

Ideally, your cache should be 100% full at all times. memcached uses a hashing algorithm to lookup keys, so as far as I know, there shouldn't be a performance penalty for storing more keys.

Should I cache smaller but more objects in memcached, or bigger but less number of objects?

I would imagine that bigger but fewer objects would be preferable to reduce the amount of time for both database and cache lookups, but you should take this on a case by case basis. Unless you know that the performance difference would be drastic, I'd recommend starting with what's easiest first and working from there if that isn't sufficient.

Jason Baker
Thank you for your input. Should I prefer having less round trips, but more server memory usage then more round trips but less server memory usage? Generally speaking.
ray247
I'll put it this way: you always can buy memory much more cheaply than you can buy more bandwidth. So I'd prefer fewer roundtrips unless I had a good reason not to.
Jason Baker
If you are paying for bandwidth used by memcached, then you probably should not be using it. That said, fewer roundtrips will be faster simply because of network latency, though multi-get can help a lot for this as well.
Alister Bulman
+3  A: 

Memcached uses a hash internally to have an O(1) lookup - it's designed to be doing as little complicated work as possible.

As far as what to cache, big or small, it's really about what you need to store that will save you effort (bearing in mind it's a big dumb cache, you have to help keep it synchronised if you change one piece that is also referred to elsewhere). On the original site it was written for, Livejournal.com, the largest block that made sense was one complete journal entry - as the finished HTML that could be used by anyone that was allowed to see that particular post.

I've used it for some very small entries - literally a single number against a member-ID, but I'm generating a few thousand such entries en-mass with a single database query rather than one at a time as required.

You can optimise the daemon somewhat if you know that you will only be storing very large, or very small items, but for the many small entries, it has enough smarts to split empty large slabs of memory into smaller chunks for use.

Alister Bulman
How long do you cache, say, for your User object? Should I cache like 30 seconds or 2 minutes?
ray247
How often does the data change? If it will change often, but not be used frequently, you might not get a big benefit from caching it at all. If it rarely changes, you might cache it for weeks, but then remove it from cache when it is changed, to be re-read.PS vote up useful answers.
Alister Bulman