views:

133

answers:

3

My boss wants to have a system that takes into concern of continent wide catastrophic event. He wants to have two servers in US and two servers in Asia (1 login server and 1 worker server in each continent).

  1. In the event that earthquake breaks the connection between the two continents, both should work alone. When the connection is revived, they should sync each other back to normal.
  2. External cloud system not allowed as he has no confidence.
  3. The system should take into account of scalability which means addition of new servers should be easy to configure.
  4. The servers should be load balanced.
  5. The connection between the servers should be very secure(encrypted and send through SSL although SSL takes care of encryption).
  6. The system should let one and only one user log in with one account. (beware of latency between continent and two users sharing account may reach both login server at the same time)

Please help. I'm already at the end of my wit. Thank you in advance.

+1  A: 

You could contact one of the solid and experienced hosting provides (we use Rackspace) that have data centers in different regions world wide and get their recommendations upon your requirements.

Gennady Shumakher
+1  A: 

This is another one of those things where employers tend not to understand the benefits of using an off-the-shelf solution. If you as a programmer don't really even know where to start with this, then rolling your own is probably a going to be a huge money and time sink. There's nothing wrong with not knowing this stuff either; high-availability, failsafe networking that takes into consideration catastrophic failure of critical components is a large problem domain that many people pour a lot of effort and money into. Why not take advantage of what providers have to offer?

Give talking to your boss about using existing cloud providers one more try.

Dustman
+1  A: 

I imagine that these requirements (if properly analysed) are essentially incompatible, in that they cannot work according to CAP Theorem.

If you have several datacentres, even if they are close by, partitions WILL happen. If a partition happens, either availability OR consistency MUST be lost, because either:

  • you have a pre-determined "master", which keeps working and other "slave" DCs which fail (or go readonly). This keeps consistency at the expense of availability.
  • OR you lose consistency for the duration of the partition (this means that operations which depend on immediate consistency are also unavailable).

This is incompatible with your requirements, as far as I can see. What your boss wants is clearly impossible. He needs to understand CAP theorem.

Now, in YOUR application case, you may decide that you can bend the rules and redefine what consistency or availiblity are, for convenience, and have a system which degrades into an inconsistent but temporarily acceptable state.

You probably want to get product management to have a look at the business case for these requirements. Dropping some of them is probably ok. Consistency is a good requirement to keep, as it makes things behave as people expect - this means to drop availability or partition-tolerance. Keeping consistency is definitely easier from an engineering perspective.

MarkR