views:

42

answers:

2

I am about to embark on writing a system that needs to re-balance it's load distribution amongst the remaining nodes once one of more of the nodes involved fail. Anyone have any good references on what to avoid and what works?

In particular I'm curious how one should start in order to build such a system to to be able to unit-test it.

+1  A: 

This question smells like my distributed systems class. So I feel I should point out the textbook we used.

It covers many aspects of distributed systems at an abstract level, so a lot of its content would apply to what you're going to do.

It does a pretty good job of pointing out pitfalls and common mistakes, as well as giving possible solutions.

The first edition is available for free download from the authors.

The book doesn't really cover unit-testing of distributed systems though. I could see entire book written on just that.

Ben S
+1  A: 

This sounds like a task that involves a considerable degree of out-of-process communication and other environment-dependent code.

To make your code Testable, it is important to abstract such code away from your main logic so that you can unit test the core engine without having to depend on any of these environment-specific things.

The recommended approach is to hide such components behind an interface that you can then replace with so-called Test Doubles in unit tests.

The book xUnit Test Patterns cover many of these things, and much more, very well.

Mark Seemann