views:

318

answers:

3

I've to move a Windows based multi-threaded application (which uses global variables as well as an RDBMS for storage) to an NLB (i.e., network load balancer) cluster. The common architectural issues that immediately come to mind are

  • Global variables (which are both read/ written) will have to be moved to a shared storage. What are the best practices here? Is there anything available in Windows Clustering API to manage such things?

  • My application uses sockets, and persistent connections is a norm in the field I work. I believe persistent connections cannot be load balanced. Again, what are the architectural recommendations in this regard?

+4  A: 

I'll answer the persistent connection part of the question first since it's easier. All good network load-balancing solutions (including Microsoft's NLB service built into Windows Server, but also including load balancing devices like F5 BigIP) have the ability to "stick" individual connections from clients to particular cluster nodes for the duration of the connection. In Microsoft's NLB this is called "Single Affinity", while other load balancers call it "Sticky Sessions". Sometimes there are caveats (for example, Microsoft's NLB will break connections if a new member is added to the cluster, although a single connection is never moved from one host to another).

re: global variables, they are the bane of load-balanced systems. Most designers of load-balanced apps will do a lot of re-architecture to minimize dependence on shared state since it impedes the scalabilty and availability of a load-balanced application. Most of these approaches come down to a two-step strategy: first, move shared state to a highly-available location, and second, change the app to minimize the number of times that shared state must be accessed.

Most clustered apps I've seen will store shared state (even shared, volatile state like global variables) in an RDBMS. This is mostly out of convenience. You can also use an in-memory database for maximum performance. But the simplicity of using an RDBMS for all shared state (transient and durable), plus the use of existing database tools for high-availability, tends to work out for many services. Perf of an RDBMS is of course orders of magnitude slower than global variables in memory, but if shared state is small you'll be reading out of the RDBMS's cache anyways, and if you're making a network hop to read/write the data the difference is relatively less. You can also make a big difference by optimizing your database schema for fast reading/writing, for example by removing unneeded indexes and using NOLOCK for all read queries where exact, up-to-the-millisecond accuracy is not required.

I'm not saying an RDBMS will always be the best solution for shared state, only that improving shared-state access times are usually not the way that load-balanced apps get their performance-- instead, they get performance by removing the need to synchronously access (and, especially, write to) shared state on every request. That's the second thing I noted above: changing your app to reduce dependence on shared state.

For example, for simple "counters" and similar metrics, apps will often queue up their updates and have a single thread in charge of updating shared state asynchronously from the queue.

For more complex cases, apps may swtich from Pessimistic Concurrency (checking that a resource is available beforehand) to Optimistic Concurrency (assuming it's available, and then backing out the work later if you ended up, for example, selling the same item to two different clients!).

Net-net, in load-balanced situations, brute force solutions often don't work as well as thinking creatively about your dependency on shared state and coming up with inventive ways to prevent having to wait for synchronous reading or writing shared state on every request.

I would not bother with using MSCS (Microsoft Cluster Service) in your scenario. MSCS is a failover solution, meaning it's good at keeping a one-server app highly available even if one of the cluster nodes goes down, but you won't get the scalability and simplicity you'll get from a true load-balanced service. I suspect MSCS does have ways to share state (on a shared disk) but they require setting up an MSCS cluster which involves setting up failover, using a shared disk, and other complexity which isn't appropriate for most load-balanced apps. You're better off using a database or a specialized in-memory solution to store your shared state.

Justin Grant
+1  A: 
  1. Concurrency (Check out Apache Cassandra, et al)
  2. Speed of light issues (if going cross-country or international you'll want heavy use of transactions)
  3. Backups and deduplication (Companies like FalconStor or EMC can help here in a distributed system. I wouldn't underestimate the need for consulting here)
orokusaki
Thanks for the Apache link!
Jaywalker
No prob. It's in dev still, but I hear it's pretty robust.
orokusaki
+1  A: 

Regarding persistent connection look into the port rules, because port rules determine which tcpip port is handled and how.

MSDN:

When a port rule uses multiple-host load balancing, one of three client affinity modes is selected. When no client affinity mode is selected, Network Load Balancing load-balances client traffic from one IP address and different source ports on multiple-cluster hosts. This maximizes the granularity of load balancing and minimizes response time to clients. To assist in managing client sessions, the default single-client affinity mode load-balances all network traffic from a given client's IP address on a single-cluster host. The class C affinity mode further constrains this to load-balance all client traffic from a single class C address space.

In an asp.net app what allows session state to be persistent is when the clients affinity parameter setting is enabled; the NLB directs all TCP connections from one client IP address to the same cluster host. This allows session state to be maintained in host memory;

The client affinity parameter makes sure that a connection would always route on the server it was landed initially; thereby maintaining the application state.

Therefore I believe, same would happen for your windows based multi threaded app, if you utilize the affinity parameter.

  1. Network Load Balancing Best practices
  2. Web Farming with the Network Load Balancing Service in Windows Server 2003 might help you give an insight
KMan