views:

382

answers:

5

I've been working on a project, which is a combination of an application server and an object database, and is currently running on a single machine only. Some time ago I read a paper which describes a distributed relational database, and got some ideas on how to apply the ideas in that paper to my project, so that I could make a high-availability version of it running on a cluster using a shared-nothing architecture.

My problem is, that I don't have experience on designing distributed systems and their protocols - I did not take the advanced CS courses about distributed systems at university. So I'm worried about being able to design a protocol, which does not cause deadlock, starvation, split brain and other problems.

Question: Where can I find good material about designing distributed systems? What methods there are for verifying that a distributed protocol works right? Recommendations of books, academic articles and others are welcome.

+3  A: 

I liked the book Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum and Maarten van Steen.

starblue
That appears to be the same book that they use at my university. Amazon has differing reviews about it, so I might not buy it, but have a look at it in the university library.
Esko Luontola
A: 

At a more abstract and formal level, Communicating and Mobile Systems: The Pi-Calculus by Robin Milner gives a calculus for verifying systems. There are variants of pi-calculus for verifying protocols, such as SPI-calculus (the wikipedia page for which has disappeared since I last looked), and implementations, some of which are also verification tools.

Pete Kirkham
Interesting. I'll have a look at that.
Esko Luontola
+4  A: 

Learning distributed computing isn't easy. Its really a very vast field covering areas on communication, security, reliability, concurrency etc., each of which would take years to master. Understanding will eventually come through a lot of reading and practical experience. You seem to have a challenging project to start with, so heres your chance :)

The two most popular books on distributed computing are, I believe:

1) Distributed Systems: Concepts and Design - George Coulouris et al.

2) Distributed Systems: Principles and Paradigms - A. S. Tanenbaum and M. Van Steen

Both these books give a very good introduction to current approaches (including communication protocols) that are being used to build successful distributed systems. I've personally used the latter mostly and I've found it to be an excellent text. If you think the reviews on Amazon aren't very good, its because most readers compare this book to other books written by A.S. Tanenbaum (who IMO is one of the best authors in the field of Computer Science) which are quite frankly better written.

PS: I really question your need to design and verify a new protocol. If you are working with application servers and databases, what you need is probably already available.

Mystic
Intellectual challenge was my primary reason for starting this project - it's the most complex program I've done. Even if nobody will use it, I will learn lots about distributed systems and other complex topics. :)
Esko Luontola
+2  A: 

I learned a lot by looking at what is published about really huge web-based plattforms, and especially how their systems evolved over time to meet their growth.

Here a some examples I found enlightening:

  • eBay Architecture: Nice history of their architecture and the issues they had. Obviously they can't use a lot of caching for the auctions and bids, so their story is different in that point from many others. As of 2006, they deployed 100,000 new lines of code every two weeks - and are able to roll back an ongoing deployment if issues arise.

  • Paper on Google File System: Nice analysis of what they needed, how they implemented it and how it performs in production use. After reading this, I found it less scary to build parts of the infrastructure myself to meet exactly my needs, if necessary, and that such a solution can and probably should be quite simple and straight-forward. There is also a lot of interesting stuff on the net (including YouTube videos) on BigTable and MapReduce, other important parts of Google's architecture.

  • Inside MySpace: One of the few really huge sites build on the Microsoft stack. You can learn a lot of what not to do with your data layer.

A great start for finding much more resources on this topic is the Real Life Architectures section on the "High Scalability" web site. For example they a good summary on Amazons architecture.

markus
+1  A: 

One good book is Birman's Reliable Distributed Systems, although it has its detractors.

If you want to formally verify your protocol you could look at some of the techniques in Lynch's Distributed Algorithms.

It is likely that whatever protocol you are trying to implement has been designed and analysed before. I'll just plug my own blog, which covers e.g. consensus algorithms.

HenryR