views:

105

answers:

1

I have a system which needs to send requests to an external system, whenever a user does search on my system.

If the external system is down or is taking an unusually long time to answer, I would like my system to "back off" for a while. Instead of trying to make more requests to the external system, I want to just let the user of my system know immediately that we will not be processing their requests at the moment.

This would result in a better experience for the user (doesn't have to wait for timeouts), less resource usage in my system (threads won't be busy waiting for no responses or timeouts from the external system) and it would spare the external system. (in a situation where it probably is already struggling with load)

After some time, or when my system discovers that the external system is responding again, I would like to resume normal behaviour again.

Is there any patterns or standard ways of doing this kind of thing? Specifically the mechanism for keeping track of timed out/long requests, and some sort of control mechanism for when we should start trying again.

+2  A: 

I don't remember seeing this described in the literature, but the pattern I've noticed for such tasks centers on a "scheduling queue" -- a way to make various things happen (==get functions or methods called back) at certain times unless previously canceled (e.g. Python's sched standard library module). When you send an (async) request to the back-end you also schedule a timeout event for X seconds from now; either the request object knows the ID of the scheduled timeout (to cancel it if the request is satisfied before then), or a set of pending requests is also maintained (so the timeout knows when it's not really needed) -- which is a good idea anyway as it makes handling "timeouts that really mean it" easier, see below.

When a timeout does occur it schedules a retry for Y seconds in the future and moves all pending requests from that container over to a container of requests to be retried in the future (and cancels all other timeouts if that's how the system is set up), and also sends the notifications "backend is slow, we'll retry in Y seconds" to all waiting clients.

When a retry event occurs, etc etc. If new requests arrive while the system is suspended, they go right into the "to be retried" bin.

While I can't find this pattern described, if anywhere, it's probably in Schmidt's excellent book... highly recommended reading anyway!-)

Alex Martelli