views:

292

answers:

4

Hi,

There seems to be a big push for key/value based databases, which I believe memcache to be.

Is the value usually some sort of collection or xml file that would hold more meaningfull data?

If yes, is it generally faster to deserialize data then to do traditinally JOINS and selects on tables that return a row based result set?

+3  A: 

As with most things, "it depends". If the joins are relatively inconsequential (that is, a small number of joins on well-keyed data), and you are storing especially complex data, it may be better just to stick with the more complex query.

It's also a matter of freshness. In many cases the purpose of many joins is to bring together very disparate data; that is, data which varies widely in its relative freshness. It can add considerable complexity and overhead to keep a key-value pair table synchronized when a small slice of the data across a large number of pairs is updated. System complexity can often be considered a form of performance cost; the time, risk and cost to make a change to a complex system without impacting performance is often far greater than a simple one.

The best solution is always to code what works as simply as you can. In most cases I'd say this means create a fully normalized database design and join the crap out of it. Only revisit your design after performance becomes an obvious problem. When you analyze the issue, it will also be obvious where the problems lie and what needs to be done to fix them. If it's reducing joins, then so be it. You'll know when you need to know.

Rex M
I don't quite agree with the "..join the crap out of it." part. My experience tells me join should be done sensibly. Too much normalization is almost always a bad thing.
Rosdi
+2  A: 

I don't have a lot of experience with key/value dbs, so take what I say with a grain of salt.

With that said, the first thing I should point out is that memcached isn't a key/value database. A database implies some kind of persistent store, which memcached isn't. Memcached is intended to be a temporary store to save a query to the actual database.

Other than that, my understanding is that you're not going to be able to replace your RDBMS with a key/value database. They tend to be best for unstructured data or other data where you may not know all the attributes that need to be stored. If you need to store highly-structured data, you can't do much better than a traditional RDBMS.

Jason Baker
+6  A: 

What has happened is that some really, really, REALLY big web sites like Google and Amazon occupy a teeny, tiny niche where their data storage and retrieval requirements are so different to anyone else's that a new way of storing/retrieving data is called for. I'm sure these guys know what they are doing, they are very good at what they do.

However, then this gets picked up and reported on and distorted into "relational databases aren't up to handling data for the web". Also, readers start to think "hey, if relational databases aren't good enough for Amazon and Google, they aren't good enough for me."

These inferences are both wrong: 99.9% of all databases (including those behind web sites) are not in the same ball park as Amazon and Google - not within several orders of magnitude. For this 99.9%, nothing has changed, relational databases still work just fine.

Tony Andrews
Amen, brother! :-)
ObiWanKenobi
So my web applications will work just fine with MySQL and (possibly) Memcached?
Rosdi
I would imagine so, yes. I don't know anything about Memcached, but having just Googled it I see it is simply a mechanism for "remembering" values once retrieved from the database in a session, rather than repeatedly going back to the database to get them. It has nothing to do with key/value databases AFAICT. Such caching is probably sensible if used judiciously: don't use it for data that is likely to have changed since last accessed (unless of course you don't care that it has.)
Tony Andrews
+1  A: 

They can be complex structured data that needs deserialization. They can also be simple fixed-size records, just like your RDBMS. Part of the benefit is that you get to make that decision yourself. When you're optimizing your database, you're not limited to what SQL can do.

The way you ask makes it sound like the join or the deserialization will always be the bottleneck. But in any database, things are never that simple. You can put denormalized data in your RDBMS, too, or write an RDBMS interface on top of a key-value database, if you really want.