views:

690

answers:

2

I am in the early stages of design of an application that has to be highly available and scalable. I want to use an eventual consistency data model for this for a number of reasons. I know and understand why this is an unpopular architectural choice for many solutions, but it's important in my case.

I am looking for real-world advice, best-practices and gotchas to look out for when dealing with distributed / document-style databases. And particularly areas around e-commerce (shopping cart style) apps that traditionally are easier to put together with a relational db.

I understand using these types of DB is challenging, but hey, Google and E-bay use them so they can't be that hard ;-) Any advice would be appreciated.

+1  A: 

How to acheive high availability and scalability using relational databases is well known and there is a vast body of knowledge out there on how to do this!

Google is a special case which does not apply to most sites, very very high volumes of queries, very very large amounts of data, and, most importalntly no Sservice Level Agreements with most of its users. There is no correct answer to a Web search only better answers, for the average user google is good enough, if google misses a vital page from a search list you as a user cannot complain.

E-Bay is a rather different case, somehow they have persauded there users and customers to accpet poor service in exchange for theortically lower prices -- good on them but this is not an option for every business.

James Anderson
+7  A: 

If you want to have a Distributed System (that "Eventual Consistency" thing) you need people, build, maintain and to operate it.

I found that there are three classes of people which have very little problems with "Eventual Consistency":

  • People with a solid background in distributed systems. They have learned about Eventual Consistency Byzantine Failures and stuff like that. If you understand that Paxos is not about holidays, you are probably one of them.
  • People experienced in network programming. They might miss the theoretical background but have an intuitive understanding of asynchronity and the "no global clocks & counters" paradigm. If you own at least 8 books by Richard Stevens you are probably one of them.
  • Very experienced coders which had little exposure to RDBMS. Kernel guys, people from scientific computing and the gaming industry come to mind.

All in all this people are very sought after in the job market. For example 75% or so of the academics in distributed systems leave for institutions who run big, self-designed distributed systems, e.g. the stock exchanges.

The whole thing got somewhat simpler with offerings like Hardoop, SimpleDB and CouchDB but it is still a big challenge to build something on distributed systems technology.

On the other Hand RDBMS are a very fine pice of engineering. They are well understood and expertise on them is available the job market. There are a lot of decent tools, education opportunities and lots of highly skilled experts are available to be rented by the hour. So think twice of you can't get on with a RDBMS approach - perhaps coupled with some clever cheating. I usually point students to the Lifejournal architecture.

For Distributed Databases there is much less experience. That's exactly the reason you have found so little advice so far.

If you are determined to use "Eventual Consistency" I think besides immature tools the main challenge is the mindset of everyone involved. Are your API users (coders) and application users (your employees and your customers) are willing and able to accept the inconsistency? Can you hide it from certain classes of users? We are not used to that mindset that computers are inconsistent. Something is in stock or it isn't. "Maybe" isn't an answer users expect.

Also keep in mind that "eventual" can mean a very long time to algorithm designers. For how long can you accept inconsistency?

For a shopping cart application you might want to go truly distributed: Use the Clients Browser as data store. On checkout you can submit the cart to the server side batch processing system. This means for the catalog you need read only high availability (easier) and the cart submission is a very narrow interface with no need for transactions. Later on the processing of the order has no (Soft) real time requirements and thus is easier.

BTW: Last time I checked on E-Bay architecture they where big in RDBMS but it may have changed since then. (Edit: it did change - see comments)

mdorseif
see http://www.infoq.com/articles/ebay-scalability-best-practices
frank06
I assume part of this is tongue-in-cheek: according to his own web page, W. Richard Stevens has only published seven books!
James A. Rosen