views:

506

answers:

1

I'm pretty sure this is not Serverfault material, so bear with me.

Wanting to implement a cache system for our application, we've started integrating with Memcached. Recently I started hearing of Hypertable, and saw some great benchmarks done with that..

However, I couldn't find good comparison between the two.

Just to get things straight: I know that Hypertable is considered closer to a DB than to a cache. On the other hand, it's not exactly an RDBMS - in fact, it's exactly not an RDBMS. It has its own benefits, but the question is whether they're worth the performance cost (if any)?

Thanks!

A: 

Hypertable is an implementation of concepts in Google's BigTable. Namely a column-oriented DB which has properties of being highly denormalized which means it doesn't need joins.

Memcached is an in-memory caching layer which acts like a distributed hashtable, keeping your app from having to hit the actual DB.

Both lend themselves well to being distributed and work well with MapReduce style topologies but they serve different purposes. Memcached/DHT is going to serve to speed access to data in memory while HyperTable/BigTable are actual mechanisms for permanent data storage on disk.

McKAMEY
Why are you linking to a non-existent article in Wikipedia?
Alex
It was a typo. Wikipedia apparently has case-sensitive URLs.
McKAMEY
Thanks for the explanation but that's not what I'm seeking. As I said, I know the difference between the two, and what they're supposed to do. The question is: what are the performance differences between the two? It would be nice to see a memcached benchmark compared to a Hypertable benchmark to see if memcached is actually required for a given task at hand.
Aviad Ben Dov
I doubt that you're going to find a direct comparison as they aren't replacements for one another. You might rephrase the question to compare a SQL-DB/Memcached stack to a Hadoop-DFS/Hypertable stack, but even still there are many, many variables which would affect the answer. Not the least of which is network topology and structure of the data.It's analogous to asking "Which is faster a Windows machine or a Linux machine?" Answer: depends on a lot.
McKAMEY
I agree with you - I need a concrete answer, not something vague. I guess I don't know how to ask it properly.What kind of data do you think I should provide in order to make this question more concrete? You mentioned network topology, for example.
Aviad Ben Dov
Unfortunately I think your best answer is you may have to run some tests with sample data in a similar configuration to yours. Alternatively since Memcached is a caching layer if you design your middle tier right, then you could add it later regardless of the backing store (Hypertable, HBase, SQL). This would let you do what a lot of big sites (Twitter, SO) do and essentially hold the DB in memory and think of the disk as a backup. Then your problem becomes comparing SQL/RDBMS to Hypertable/column-oriented. This might be easier to find a concrete answer for, but still depends on your data.
McKAMEY
Unfortunately you might be right. Luckily, we did make a good separation layer for the caching system, so it can be done as you suggested (this is why we did it in the first place anyway).
Aviad Ben Dov