views:

396

answers:

3

A problem I've encountered a few time in my career is in a tiered service architecture a single downstream system can bring down the entire client application if it gets into a state where all its threads are consumed on a deadlock or some sort of infinite loop bug in that system. Under these conditions the server socket on the JEE server is still accepting and queueing requests from client applications. This causes the client application to use up all its threads waiting for responses from properly established socket connections. Then all users are locked out of the system as their requests are also being queued.

I've thought of a few solutions but I was wondering if the community has some better ones.

  1. Isolated thread pools for downstream requests. This becomes a problem because you compound the number of idle threads in you system creating many small pools that need to have enough threads to ensure full throughput. Spawning threads means you need to deal with Transaction and Security contexts yourself. Not really a supported JEE solution.

  2. MDB solution, the preferred asynchronous solution for JEE, this however seems rather heavy-weight but has the added benefit of letting the app server deal with management the MDB thread pools. (Currently number one on my list)

  3. ESB. This is even more heavy-weight and adds more network and processing time. But it allows you to set individual service timeouts. Also has the problem of it will take forever to get implemented in a big corporation so probably not practical for my time-frame.

Do you guys have any better ideas?

A: 

We use MDBs where the queue is persisted in a database which has the benefit of messages not being lost if the system goes down.

You may also want to establish an asynchronous contract between the participating parties. What I mean by this is that a client will send you a message and rather than you doing a lot of heavy weight processing and returning a response, you simply send an acknowledgement response and later send an asynchronous message to them with the full results.

You should also establish a protocol for allowing the client to resend a message if they have not received a full response within an established time.

Paul Croarkin
A: 

You are correct in that the MDB case is the normal solution, and it typically supports timeouts as well which will help keep from hanging requests. That being said, it may not really fix the problem but just shift the backup to your JMS queue without responses ever being sent back to the client. Of course if only 1 of several services cause this problem, the others will now still be accessible.

Your proposal (1) is also doable on WebSphere or Weblogic via the commonj WorkManager. It will allow you to create managed threads in these environments and is pretty lightweight.

WorkManager and TimerManager API

Robin
The WorkManager stuff isn't available until Weblogic 9. We're still on 8.1 unfortunately.
William
A: 

Hi,

You could try a light-weight MDB approach with Atomikos MessageDrivenContainers (Message-Driven POJOs). Your application will be more light-weight, better testable and probably more scalable too.

See http://www.atomikos.com/Publications/J2eeWithoutApplicationServer.

HTH

Guy

Guy Pardon