views:

1120

answers:

8

We have a Windows Service written in C#. The service spawns a thread that does this:

private void ThreadWorkerFunction()
{
  while(false == _stop) // stop flag set by other thread
  {
    try
    {
      openConnection();

      doStuff();

      closeConnection();
    }
    catch (Exception ex)
    {
      log.Error("Something went wrong.", ex);

      Thread.Sleep(TimeSpan.FromMinutes(10));
    }
  }
}

We put the Thread.Sleep in after a couple of times when the database had gone away and we came back to 3Gb logs files full of database connection errors.

This has been running fine for months, but recently we've seen a few instances where the log.Error() statement logs a "System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable" exception and then never ever comes back. The service can be left running for days but nothing more will be logged.

Having done some reading I know that Thread.Sleep is not ideal, but why would it simply never come back?

A: 

Have you tried using Monitor.Pulse (ensure your thread is using thread management before running this) to get the thread to do something? If that works, then you're going to have to look a bit more into your threading logic.

James Burgess
+3  A: 

We put the Thread.Sleep in after a couple of times when the database had gone away and we came back to 3Gb logs files full of database connection errors.

I would think a better option would be to make it so that your logging system trapped duplicates, so that it could write something like, "The previous message was repeated N times".

Assume I've written a standard note about how you should open your connection at the last possible moment and close it at the earliest opportunity, rather than spanning a potentially huge function in the way you've done it (but perhaps that is an artefact of your demonstrative code and your application is actually written properly).

When you say that it's reporting the error you describe, do you mean that this handler is reporting the error? The reason it's not clear to me is that in the code snippet you say "Something went wrong", but you didn't say that in your description; I wouldn't want this to be something so silly as the exception is being caught somewhere else, and the code is getting stuck somewhere other than the sleep.

DrPizza
+5  A: 

Dig in and find out? Stick a debugger on that bastard!

I can see at least the following possibilities:

  1. the logging system hangs;
  2. the thread exited just fine but the service is still running because some other part has a logic error.

And maybe, but almost certainly not, the following:

  • Sleep() hangs.

But in any case, attaching a debugger will show you whether the thread is still there and whether it really has hung.

Sander
A: 

From the code you've posted, it's not clear that after an exception is thrown the system is definitely able to restart - e.g. if the exception comes from doStuff(), then the control flow will pass back (after the 10 minute wait) to openConnection(), without ever passing through closeConnection().

But as others have said, just attach a debugger and find where it actually is.

Will Dean
A: 

Try Thread.Sleep(10 * 60 * 1000)

Shawn Simon
A: 

I never fully figured out what was going on, but it seemed to be related to ThreadInterruptedExceptions being thrown during the 10 minute sleep, so I changed to code to:

private void ThreadWorkerFunction()
{
  DateTime? timeout = null;

  while (!_stop)
  {
    try
    {
      if (timeout == null || timeout < DateTime.Now)
      {
        openDatabaseConnections();

        doStuff();

        closeDatabaseConnections();
      }
      else
      {
        Thread.Sleep(1000);
      }
    }
    catch (ThreadInterruptedException tiex)
    {
      log.Error("The worker thread was interrupted... ignoring.", tiex);
    }
    catch (Exception ex)
    {
      log.Error("Something went wrong.", ex);

      timeout = DateTime.Now + TimeSpan.FromMinutes(10);
    }
  }
}

Aside from specifically catching the ThreadInterruptedException, this just feels safer as all the sleeping happens within a try block, so anything unexpected that happens will be logged. I'll update this answer if I ever find out more.

d4nt
Is your worker thread in a threadpool, or is it a real thread?
Roger Lipscombe
A: 

Stumbled on this while looking for a Thread.Sleep problem of my own. This may or may not be related, but if your doSomething() throws an exception, closeDatabaseConnections() won't happen, which has some potential for resource leaks.. I'd put that in a finally block. Just something to think about.

tkrehbiel
A: 

Hi there, I've had exactly the same problem. Moving the Sleep line outside of the exception handler fixed the problem for me, like this:

bool hadError = false;
try {
  ...
} catch (...) {
  hadError = true;
}
if (hadError)
  Thread.Sleep(...);

Interrupting threads does not seem to work in the context of an exception handler.

Michel