views:

279

answers:

2

I can see myself using Project Voldermort to cache results from a Traditional RDBMS query. But in this case, it provides almost no major advantage over other (Java) caching system such as EHcache Jcache etc.

Where else could I use Project Voldermort or similar Key Value stores ? How are you using this in your business applications ?

+1  A: 

Project Voldermort is part of the NoSQL movement. Trends in computer architectures are pressing databases in a direction that requires horizontal scalability. NOSQL attempts to address this requirement.

Among the claimed benefits of such Key/Value stores is the ability to blow through enormous amounts of data without the overhead of a traditional RDBMS.

http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_

Robert Harvey
I understand that this is a cool technology. I want to know how are normal businesses using it. By Normal businesses , I mean non-google and non-facebook type companies who have been using traditional relational databases. I am looking for use cases or scenarios where people have leveraged this technology for "normal" businesses.
See also Bigtable: http://jetfar.com/bigtable-and-why-it-changes-everything/ and http://labs.google.com/papers/bigtable.html
Robert Harvey
A: 

One approach to improving the speed of your database is to denormalize. Take this MySQL example:

CREATE TABLE `users` (
    `user_id` INT NOT NULL AUTO_INCREMENT,
    … -- Additional user data
    PRIMARY KEY (`user_id`)
);


CREATE TABLE `roles` (
    `role_id` INT NOT NULL AUTO_INCREMENT,
    `name` VARCHAR(64),
    PRIMARY KEY (`role_id`)
);


CREATE TABLE `users_roles` (
    `user_id` INT NOT NULL,
    `role_id` INT NOT NULL,
    PRIMARY KEY (`user_id`, `role_id`)
);

Neat, tidy, normalized. But if you want to get users and their roles, the query is complex:

SELECT u.*, r.*
  FROM `users` u
  LEFT JOIN `user_roles` ur ON u.`user_id` = ur.`user_id`
  JOIN `roles` r ON ur.`role_id` = r.`role_id`;

If you denormalized this, it might look something like:

CREATE TABLE `users` (
    `user_id` INT NOT NULL AUTO_INCREMENT,
    `role` VARCHAR(64),
    … -- Additional user data
    PRIMARY KEY (`user_id`)
);

And the equivalent query would be:

SELECT * FROM `users`;

This improves some of the performance characteristics of your queries:

  1. Because the result you want is already in a table, you don't have to perform read-side calculations. e.g. if you wanted to see the number of users with a given role, you'd need a GROUP BY and COUNT. If it were denormalized, you would store it in a different table devoted to holding roles and counts of users who have that role.
  2. The data you want is in the same place, and hopefully in the same place on disk. Rather than requiring many random seeks, you can do one to a few sequential reads.

NoSQL DBs are highly optimized for these cases, where you want to access a mostly-static sequential dataset. At that point, it's just moving bytes from disk to the network. Less work, less overhead, more speed. Despite how simple this sounds, it's possible to model your data and application so it feels natural.

The trade-off for this performance is write load, disk space, and some app complexity. Denormalizing your data means more copies, which means more disk space and write load. Essentially, you have one dataset per query. Because you shift the burden of those computations to write-time instead of read-time, you really need some sort of asynchronous mechanism to do that, hence some app complexity.

And because you have to store more copies, you have to perform more writes. This is why you can't practically replicate this kind of architecture with a SQL database – it's extremely difficult to scale writes.

In my experience, the trade-off is well worth it for a large-scale application. If you'd like to read a bit more about a practical application of Cassandra, I wrote this piece a few months ago, and you might find it helpful.

ieure