views:

329

answers:

3

Hi all, My program looked like this:

class Prog
 {
 BufferedImage offscreen;
 KindOfDatabase db;
 MyThread thread;

 class MyThread extends Thread
    {
    volatile boolean abort=false;
    long lastUpdated;
    public void run()
        {
        try
          {
          KindOfCursor c = db.iterator();
          while(c.getNext())
            {
            if(abort) break;
            //fill a histogram with the data,
            // calls SwingUtilities.invokeAndWait every 500ms to
            //do something with offscreen and update a JPanel
            }
          catch(Exception err)
            {
            err.printStackTrace();
            }
          finally
            {
            c.close();
            }
        }
    }

  void stopThread()
       {
       if(thread!=null)
          {
          thread.abort=true;
          thread.join();
          thread=null;
          }
       }
  void startThread()
      {
      stopThread();
      thread=new MyThread();
      thread.start();
      }
(....)
 }

1) The program worked well on my computer. But when I ran it threw a 'ssh -X remote.host.org ' connection, all was very slow and the program was frozen when thread.join() was invoked. I replaced 'join' by 'interrupt()' and the program was not anymore frozen. Why ? Should I fear that, when interrupt() is called, the 'finally' statement closing the iterator was not invoked ?

2) should I use 'Thread.isInterrupted()' instead of my boolean 'abort' ?

Thanks

UPDATE: my abort flag was labelled with volatile. Doesn't change the frozen state.

+4  A: 

You have shared data between two threads with no memory barriers. It's possible that if your main thread sets abort = true, it will set abort locally, but the other processor already has "false" in it's local cache for that field.

The volatile keyword is for exactly this purpose.

It may work on your machine because you have a single processor, but the remote machine may not.

Kevin Peterson
+1 for spotting volatile bug.
gustafc
interesting, I'll investigate this.
Pierre
Pierre
+4  A: 

Thread.join is meant to "freeze" your thread!

When you call join, the current thread will pause until the thread it's joining on has exited. In your case, this freeze is happening because the MyThread instance isn't exiting in a timely fashion.

One thing that might be biting you here - you need to declare the abort variable as volatile for changes to be reliably seen by other threads. Since you aren't doing that, it's entirely possible for your MyThread to see a cached version of the abort variable where it's always true. There's a short description of it here.

Edit: I missed your statement before about it working locally but not on a remote machine. This is actually not that uncommon with concurrency-related race conditions, as they may or may not manifest depending on various factors such as the hardware setup, the load on the machine etc. In this case for example, if your local machine only has one physical CPU/core then your erroneous code will probably run fine; there's only one CPU cache so the other thread is likely to "see" the main thread changing the abort flag, even though it's not explicitly marked as volatile. Now move that to a multi-core machine, and if the threads get scheduled onto separate cores, they'll use separate caches and all of a sudden won't see changes to non-volatile, cached variables.

This is why it's very important to understand the ramifications of concurrency and exactly what's guaranteed, because failures do not manifest in a consistent manner.

Update reaction: if this still isn't working when abort is volatile, it sounds a lot like MyThread isn't checking the variable often enough. Bear in mind that it will only "notice" the abort flag has been set directly after pulling another kind-of-row from your kind-of-cursor. If it's possible that the processing of a single element with respect to the histogram can take a long time, then of course it won't see the flag during this time.

You probably just need to check the abort flag more often. You say that you're calling invokeAndWait every 500ms; you ought to be checking your abort flag before each call of this so you have to wait at most 500ms for the thread to terminate! Have a look through this part of your code and see if there are any inner loops that you can modify to look more like while (... && !abort).

Another perpendicular approach to take is to start interrupting the thread as well. SwingUtilities.invokeAndWait in particular is interruptable, so if you call thread.interrupt() in your stopThread() method, then the invokeAndWait call will terminate very quickly (by throwing an InterruptedException) rather than you having to wait until it completes normally before your code gets a chance to check the abort flag again. This has the added benefit that if some other interruptible operation is taking a very long time to complete, it will also return straight away. (In this case you'd probably want to explicitly catch the InterruptedException in your processing bit of code; you don't really need to do anything to handle it per se, just take it as a sign to wake up and check the flag again. Read this excellent Developerworks article for more information about handling InterruptedExceptions).

Finally, if you're still having problems then some good old-fashioned println debugging will help. If you have MyThread print to the console (or possibly some log file) every time it checks the abort flag, you'll be able to see whether the problem is due to the MyThread itself "freezing"; as in this case it will never exit and the calling thread will never return from the join() call. Moving the checking of the abort flag lower-down will probably help with this.

Andrzej Doyle
Pierre
Thanks for this long answer. I can't connect to my remote server today. but I'll check your suggestion ASAP.
Pierre
+1  A: 

What does your KindOfCursor's c.next() do? Is it blocking and waiting for more data? If so, interrupting it may have caused it to stop waiting and return quickly.

Peter Štibraný
damnit, i was just typing that.
akf
getting a thread dump would display where the system is hanging, and could determine whether `c.next()` is the culprit. note that although `stopThread` sets `abort=true`, `if(abort)` needs to be executed in order to break.
akf