views:

69

answers:

1

Hello everybody,

my application works as a middle ware, receiving requests from clients, then transforming it in certain logic and sending the transformed requests to another service provider as normal HTTP requests or Webservice soap requests. The application was deployed on two jboss servers (in cluster) behind a load balancer.

let's say my application is A, the service provider is S.

Now i am informed that, every year S will be down several (3-5) times. Each down time will last about 4 hours. I can get a schedule about the down time.

During the downtime, A should not transform and send request to S any longer, but put received requests in a Queue. After S is back, the requests in the queue should be processed.

Note:

  1. The requests A received must be processed in exact order as they came.One by one. Processing a request means A sending transformed requests to S, and getting the response success or error. usually this won't take much time.

  2. Base on 1, when A is working on queued requests, new incoming requests should be enqueued, although S is already available. until the queue is empty, A can continue sending request directly to S.

  3. Every minute A receives 2-3 requests.

Since we have two Jboss, I planed to maintain this Queue in Database, threads working on queue and managing the Downtime status. However, the synchronization between two jboss always makes me headache.

shortly, the problems I met were:

  • how to set the down time flag, so that the two jboss do the request enqeue, instead of sending. (solution i thought was, before processing each request, query database for this flag. The flag was set by thread. it might be the worse solution.)

  • after S is back, how to design the Dequeue operation of two jboss. (it seems that, same time, there is always one jboss is idle... )

  • how to inform two jboss," now the queue is empty, don't do enqueue any longer."

The logic is a bit complicated. I hope I explained my problem clearly...

Do you guys have any idea on that?


Some more description about the FIFO. If there was no downtime, A can process those requests from different clients in parallel. Because this 'transaction-liked' order is ensured by the clients. for example.

client x :
-send http://..../createUser...
-received 'success' from A
-send http://../updateUser...
-received 'success' from A

if createUser() failed, the updateUser is not gonna be sent.

client y:
-send http://.. createCompany...

Given that there was another client (y) sending request createCompany at same time as x.createUser. these two requests can be processed by A in parallel.

Once thinking about the downtime and the queue:

-send http://..../createUser...
(downtime)
-received 'enqueue'
(S is back)
-send http://../updateUser...

now the order "create->update" needs to be ensured by A, not clients.


Thanks in advance!

Kent

A: 

Do you get any acknowledgements from S that a specific request was properly received? (If not, you should consider implementing this to make your app more robust, and minimise the chances of losing requests due to network problems, server crashes etc.)

Such an acknowledgment mechanism could be enhanced with an algorithm of using increasing timeouts in case an ACK is not received in time after sending a request. I.e. if the ACK is not received within a configured timeout interval t, the request is resent. The next time there is a bigger timeout, e.g. 2*t, then 4*t etc. As long as the actual request is not acknowledged, incoming newer requests are queued. Once the actual request is successfully transmitted, the queued requests are processed in FIFO order. If the queue is empty, normal processing is resumed.

This algorithm would automatically handle scheduled downtimes of S as well as any other network failures etc. at the price of somewhat more processing and network traffic. But with 2-3 requests per minute, this should not be a concern (unless individual requests are huge, of course).

Of course, it could even be improved to make the default timeout configurable by time periods. I.e. for the duration of a scheduled downtime of 4 hours, the default timeout could be set to 4 hours. Once the downtime is over, the timeout is reset to the default value.

Péter Török
there is an error handling of A. say we had a network problem, then A will get a related Exception, then save the status of this. however there is no 'retry' mechanism. If catching a timeout, for example, then set this request status 'failed'.
Kent