Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

views:

344

answers:

+6 Q:

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

We're developing a server system in Scala + Akka for a game that will serve clients in Android, iPhone, and Second Life. There are parts of this server that need to be highly available, running on multiple machines. If one of those servers dies (of, say, hardware failure), the system needs to keep running. I think I want the clients to have a list of machines they will try to connect with, similar to how Cassandra works.

The multi-node examples I've seen so far with Akka seem to me to be centered around the idea of scalability, rather than high availability (at least with regard to hardware). The multi-node examples seem to always have a single point of failure. For example there are load balancers, but if I need to reboot one of the machines that have load balancers, my system will suffer some downtime.

Are there any examples that show this type of hardware fault tolerance for Akka? Or, do you have any thoughts on good ways to make this happen?

So far, the best answer I've been able to come up with is to study the Erlang OTP docs, meditate on them, and try to figure out how to put my system together using the building blocks available in Akka.

But if there are resources, examples, or ideas on how to share state between multiple machines in a way that if one of them goes down things keep running, I'd sure appreciate them, because I'm concerned I might be re-inventing the wheel here. Maybe there is a multi-node STM container that automatically keeps the shared state in sync across multiple nodes? Or maybe this is so easy to make that the documentation doesn't bother showing examples of how to do it, or perhaps I haven't been thorough enough in my research and experimentation yet. Any thoughts or ideas will be appreciated.

+2 A:

If you're listing multiple potential hosts in your clients already, then those can effectively become load balancers.

You could offer a host suggestion service and recommends to the client which machine they should connect to (based on current load, or whatever), then the client can pin to that until the connection fails.

If the host suggestion service is not there, then the client can simply pick a random host from it internal list, trying them until it connects.

Ideally on first time start up, the client will connect to the host suggestion service and not only get directed to an appropriate host, but a list of other potential hosts as well. This list can routinely be updated every time the client connects.

If the host suggestion service is down on the clients first attempt (unlikely, but...) then you can pre-deploy a list of hosts in the client install so it can start immediately randomly selecting hosts from the very beginning if it has too.

Make sure that your list of hosts is actual host names, and not IPs, that give you more flexibility long term (i.e. you'll "always have" host1.example.com, host2.example.com... etc. even if you move infrastructure and change IPs).

Will Hartung 2010-09-11 21:23:34

Thank you sir. I'll do as you suggest. Now I just have to figure out how to have those hosts share state amongst themselves, using either an active-active or active-passive kind of approach. It looks to me like I'll need to build that, and want to make sure I'm not building something that's already done and ready to use in Akka that escaped my notice.

Unoti 2010-09-11 21:47:13

You could take a look how RedDwarf and it's fork DimDwarf are built. They are both horizontally scalable crash-only game app servers and DimDwarf is partly written in Scala (new messaging functionality). Their approach and architecture should match your needs quite well :)

puudeli 2010-09-12 09:28:20

Thanks for the tip!

Unoti 2010-09-12 17:47:18

+4 A:

Hello Unoti,

HA and load management is a very important aspect of scalability and is available as a part of the AkkaSource commercial offering.

Please contact me if you're interested,

Happy hAkking!

Cheers, Viktor

Viktor Klang 2010-09-12 18:13:31

Thank you, Viktor. I ran across this while researching this as well: http://groups.google.com/group/akka-user/browse_thread/thread/636e08b7199c9e46?fwc=2

Unoti 2010-09-12 20:06:38

+4 A:

I'm building this right now in Akka commercial offering. Should be done soon. Email me or Viktor if you are interested.

Jonas Bonér 2010-09-13 05:38:09

ansaurus

tags:

views:

answers:

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

related questions