views:

339

answers:

3

I have heard that Amazon uses a key-value data store - that it does not use a traditional relational normalized db. Speaking as someone who only has used the traditional approach, how does this work? Don't you need to do the following?

select * from book where book_id = n

Or a:

select * from book where author_id = y

How can you build a site/app with so much data and so many relationships without a normalized db?

+4  A: 

It only uses its Dynamo key-value store for its shopping cart and select other applications.

Michael Greene
+5  A: 

You may want to start your research by checking out these Stack Overflow articles:

The Amazon datastore is offered to the public as SimpleDB, which is part of the Amazon Web Services.

Also note that Google offers a similar "sparse, distributed multi-dimensional sorted map" datastore for the Google App Engine.

Daniel Vassallo
+5  A: 

The Amazon.com architecture is very interesting. They moved to a service oriented architecture, if you look at all the different content areas on their site, each one is served by a different service. So there is a 'wish list' service and a 'Related to Items You've Viewed' service, and Bestsellers service, Shopping cart service, etc.

Each of the services has its own set of requirements and features. The requirements include things like response time and availability. Internally each service is implemented using whatever database best suits the needs. The key value store is good for a shopping cart, because you never need to do:

select * from book where book_id = n

on a shopping cart.

One of the important things to realize is the enormous role that availability plays at Amazon scale. Consider that Amazon 2008 revenue was $19.166 billion. The total retail revenue from from the Amazon.com site may be more than $1000 per second during the day (it may be double that, for all I know, during peak hours. It could be 5 times that during peak holiday shopping). Think of the cost if the shopping cart service goes down for 3 minutes during peak usage. It is clear that the loss would be a large dollar value in abandon carts.

Using a key-value store doesn't mean embracing rampant data duplication, it means redesigning applications so the necessary data doesn't need sit all in one monolithic database.

Amazon is really more of a platform for applications than anything else. Here is a video of Amazon's CTO talking about just that.

Mocky