views:

234

answers:

5

We have a C# application that connects to a FTP server, downloads some files, disconnects, and after a certain amount of time (selected by the user through the UI) reconnects and repeats the process. We implemented this using BackgroundWorker, but we noticed that after running for a longer time, the program stopped logging its actions, both in the UI and the log file. At that point, there were no files for it to download, so we uploaded some and it resumed activity as if nothing had happened.

The problem was that the regular users had no way of knowing that the program was still working, so we decided to implement it using our own threading. We did a simpler program, to rule out any other problems, and this one only connects to the FTP and disconnects. It stopped displaying messages just like BackgroundWorker (one time after 2 hours, one time after 22 hours, without any pattern that we could find, and on a computer that did nothing else).

DoFTPWork += new DoFTPWorkDelegate(WriteFTPMessage);

FTPWorkThread = new Thread(new ParameterizedThreadStart(Process));

//seData is the FTP login info
FTPWorkThread.Start(seData);

and the FTP method is:

private void Process(object seData1)
{
    seData = (SEData)seData1;
    while (!stopped)
    {
        try
        {
            ftp = null;
            ftp = new FTP_Client();

            if (ftp.IsConnected)
            {
                logMessages += DateTime.Now + "\t" + "info" + "\t" + "Ftp disconnected from " + seData.host + "\r\n";
                ftp.Disconnect();
            }

            ftp.Connect(seData.host, 21);
            ftp.Authenticate(seData.userName, seData.password);
            logMessages += DateTime.Now + "\t" + "info" + "\t" + "Ftp connected to " + seData.host + "\r\n";

            error = false;
            logMessages += DateTime.Now + "\t" + "info" + "\t" + "Trying to reconnect in 5 seconds\r\n";
            System.Threading.Thread.Sleep(5000);
            SlaveEventArgs ev = new SlaveEventArgs();
            ev.Message = logMessages;
            txtLog.Invoke(DoFTPWork, ev);
            System.Threading.Thread.Sleep(200);
            logMessages = "";
        }

        catch (Exception ex)
        {
            logMessages = "";
            if (ftp.IsConnected)
            {
                ftp.Disconnect();
            }
            ftp.Dispose();
            logMessages += DateTime.Now + "\t" + "ERR" + "\t" + ex.Message + "\r\n";

            logMessages += DateTime.Now + "\t" + "info" + "\t" + "Trying to reconnect in 5 seconds\r\n";
            SlaveEventArgs ev = new SlaveEventArgs();
            ev.Message = logMessages;
            txtLog.Invoke(DoFTPWork, ev);
            System.Threading.Thread.Sleep(5 * 1000);
            error = true;
        }
    }
}

WriteFTPMessage displays the message in a TextBox and in the original program wrote to a .txt file.

+3  A: 

If I'm understanding you correctly this while(!stopped) loop is the loop that is running for several hours? If that is the case, where are you terminating your ftp connection if anywhere? The only time you close it in the code you've posted is if an exception is thrown, otherwise you simply dereference the object and create a new one which is a pretty serious resource leak and at least contributing to the problem if not causing it.

Also it seems that ftp is globally accessible. Are you accessing it anywhere using a different thread? Is the object thread safe??

EDIT:

The biggest issue I see here is design. Not that I'm trying to bag on you or anything but you've got all sorts of operations intermixed. Threading, logging and ftp access code all in the same function.

What I would recommend is restructuring your program. Create a method much like the following:

// Called by thread
void MyThreadOperation()
{
   while(!stopped)
   {
      // This is poor design in terms of performance.
      // Consider using a ResetEvent instead.
      Thread.Sleep(5000);

      try
      {
         doFTPDownload();
      }
      catch(Exception ex)
      {
         logMessage(ex.ToString());
      }
   }
}

doFTPDownload() should be self contained. The FTP object should be created and opened inside the function when it is called and prior to it finishing it should be closed. This same concept should be applied to logMessage() as well. I would also recommend using a database to store log messages instead of a file so that locking issues don't complicate matters.

I know this isn't an answer in that you may still experience issues since I can't say for certain what could be the cause. However I'm confident with a little design restructuring you will be much better able to track down the source of the problem.

Spencer Ruport
+1 for the resource leak part. If there are lots of active connections, that could easily hit a (probably small) limit and cause a hang.
Jon Skeet
We did a restructuring, and in addition we cleared the logging textbox in the UI after every 100 lines and the customer didn't report any more freezing.
Rox
+1  A: 

I would suggest putting anything that can go wrong in the catch block (in particular the bit which disconnects from the FTP server) in its own try/catch block. In addition, log something as soon as you've caught the exception, before doing anything else - that way you're more likely to be able to tell if the logging dies half way through for some reason.

Also, add a log message to the end of the while loop so that you can tell if it's finished "normally".

Jon Skeet
+1 for putting logging first.
Spencer Ruport
A: 

I'd suggest using adplus when the issue reproduces and getting yourself a hang dump. Analyze in Windbg and SoS.

Is that in a Winforms application? Maybe the ISynchronizeInvoke implementation is hanging. Is this running as an interactive user?

JD Conley
A: 

Rupert: I have added ftp.Disconnect() after the catch block and started it again. I've checked the original application and we disconnected before reconnecting, so while it can influence the problem, I don't think it's causing it. There are no other threads accessing it, so there are no problems there.

Jon: I will, thanks for the suggestion.

JD: It is a Windows application, and after selecting the delay and FTP connect data, the user doesn't give any input. I'll look into ISynchronizeInvoke

Rox
I've updated my answer above.
Spencer Ruport
Also, if you have further questions comment directly on my answer. When you do that I can see it in my profile->responses area so I'll be sure to see it. If you just add another answer I may never see it.
Spencer Ruport
A: 

I think you'll have to work on making it more thread safe. You have a lot of shared fields: ftp, logMessages, error.

For instance this part:

        ev.Message = logMessages;
        txtLog.Invoke(DoFTPWork, ev);
        System.Threading.Thread.Sleep(200);
        logMessages = "";

Sounds to me like you're trying to solve a multithreading problem by sleeping and crossing your fingers you slept enough...

you could solve this via:

        ev.Message = logMessages.Clone();
        txtLog.Invoke(DoFTPWork, ev);

or use a different way of communicating.

Instead of the stopped boolean you could use a ManualResetEvent, which is a thread safe communication method. And for error you could use the same, or a Semaphore.

The nice thing about the ManualResetEvent is that you can use it to sleep your thread without locking it up completely. If I'm not mistaken the only way to stop your thread while it's sleeping is to call a thread.Abort. If you use the ManualResetEvent you could do the following:

if (!shouldStop.WaitOne(5000))
{
    // do thread stuff
}
else
{
    // do cleanup stuff and exit thread.
}

The nice thing is, you'll say I want to know if the event was signalled or not, but I will wait 5 seconds for it to signal or else I'll continue as not signalled.

So if your application decides to quit after 3 seconds in the sleep, it can just do a shouldStop.Set() and the thread will stop. It's still possible that the thread is in the process of communicating with the ftp server, so after the set you should do a thread.Join() to wait for it to exit.

I'm not saying your problem is related to my suggestions, if not I'm only trying to help reducing the possible causes.

Davy Landman