tags:

views:

74

answers:

2

I have a supervisor with two worker processes: a TCP client which handles connection to a remote server and an FSM which handles the connection protocol.

Handling TCP errors in the child process complicates code significantly. So I'd prefer to "let it crash", but this has a different problem: when the server is unreachable, the maximum number of restarts will be quickly reached and the supervisor will crash along with my entire application, which is quite undesirable for this case.

What I'd like is to have a restart strategy with back-off; failing that, it would be good enough if the supervisor was aware when it is restarted due to a crash (i.e. had it passed as a parameter to the init function). I've found this mailing list thread, but is there a more official/better tested solution?

+1  A: 

I've had this problem many times working with erlang and tried many solutions. I think the best best I've found is to have an extra process that is started by the supervisor and starts the that might crash.

It starts the child on start-up, awaits child exits and restarts the child (with a delay) or exits as appropriate. I think this is simpler than the back-off server (which you link to) as you only need to keep state regarding a single child.

Another solution that I've used is to have to start the child processes as transient and have a separate process that polls and issues restarts to any processes that have crashed.

cthulahoops
It does make a lot of sense.
Alexey Romanov
+1  A: 

You might find our supervisor cushion to be a good starting point. I use it slow down the restart on things that must be running, but are failing quickly on startup (such as ports that are encountering a resource problem).

Dustin