I am wondering conceptually how
load-balancing works on the EJB-level
(not web session replication) with
Java EE containers like Glassfish.
From what I have gleaned your remote
interface is a proxy that delegates
your call to one of many servers you
may have in an environment.
You are right. In Glassfish, the initial lookup will try to contact one of the server listed in the jndi.properties
file. The server then know all the other node in the cluster that will be used for round robin. The remote reference (proxy) will do that for you transparently. Theoretically nodes can be added/removed from the cluster dynamically. See Glassfish RMI-IIOP load balancing and fail-over.
If things fail are they supposed to be
able to "finish" on another server? I
want to understand the basic theory
behind this load balancing, why is it
better than a bunch of servers all
running a plain web application with
session affinity on a load-balancer?
If the bean is stateless, you don't even need any kind of affinity and the request can be processed on any node. Each remote reference act as a load balancer on its own.
If the bean is statefull, it's more hairy. The cluster will try to maintain 2 replica of the bean. And the request are dispatched against these two replica. If one of the node crashes, the cluster will recreate another replica until the node is back -- It's indeed similar to HTTP session replication with session affinity.
But on the contrary to a web server, bean are transactional components. So if an exception occurs, the transaction is rolled back and the stateful bean is invalidated because its state may not be consistent any longer.
As pointed out by Pascal, there is some kind of fail-over for certain kind of failure. I the node is not available, the request can re-routed to another node. But if the node fails while the request is processed, I don't know whether it can resubmit it somewhere else.
If you want to know more, I suggest you read Guide to GlassFish High Availability and Cluster Support in Glassfish.