I have an application distributed over 2 nodes. When I halt() the first node the failover works perfectly, but ( sometimes ? ) when I restart the first node the takeover fails and the application crashes since start_link returns already started.
SUPERVISOR REPORT <0.60.0> 2009-05-20 12:12:01
===============================================================================
Reporting supervisor {local,twitter_server_supervisor}
Child process
errorContext start_error
reason {already_started,<2415.62.0>}
pid undefined
name tag1
start_function {twitter_server,start_link,[]}
restart_type permanent
shutdown 10000
child_type worker
ok
My app
start(_Type, Args)->
twitter_server_supervisor:start_link( Args ).
stop( _State )->
ok.
My supervisor :
start_link( Args ) ->
supervisor:start_link( {local,?MODULE}, ?MODULE, Args ).
Both nodes are using the same sys.config file.
What am I not understanding about this process that the above should not work ?