In a multithreaded Java application, I just tracked down a strange-looking bug, realizing that what seemed to be happening was this:
- one of my objects was storing a reference to an instance of
ServerSocket
- on startup, one thread would, in its main loop in
run()
, callaccept()
on the socket - while the socket was still waiting for a connection, another thread would try to restart the component
- under some conditions, the restart process missed the cleanup sequence before it reached the initialization sequence
- as a result, the reference to the socket was overwritten with a new instance, which then wasn't able to
bind()
anymore - the socket which was blocking inside the
accept()
wasn't accessible anymore, leaving a complete shutdown and restart of the application as the only way to get rid of it.
Which leaves me wondering: a) Does the blocking call prevent, or interfere with, GC in any way? b) If the ServerSocket does get GCed, will that make the socket available again?
In general, what are good practices I can follow to avoid this type of bug? For instance, I learned two lessons here:
- All lifecycle logic (i. e. component level, init-start-stop-cleanup cycles) must be synchronized. Rather obvious, I guess, but I didn't take it seriously enough.
- Lifecycle logic should be as simple as possible to avoid my problem of non-obvious code paths that skip cleanup or initialization steps.