views:

416

answers:

3

I'm having a tough problem with invoking a native function using JNI from a thread.

The native function is legacy code that performs a computation-intensive task. Since I'd like not to freeze the rest of the program, the computation should be performed in a background thread. EventBus is used to send the calculation result back to the main program.

Basically it should be quite simple, something like this:

public class CalculationEngine {
  private CalculationEngine(){}

  public static void calculateInBackground(final Parameters parameters) {

    new Thread(new Runnable() {
      public void run() {
        // Someone might change the parameters while our thread is running, so:
        final Parameters clonedParameters = parameters.clone();
        Results results = new Results();
        natCalc(clonedParameters, results);
        EventBus.publish("Results", results);
      }
    }).start();

  }

  public static void calculateNormally(final Parameters parameters) {
    Results results = new Results();
    natCalc(parameters, results);
    EventBus.publish("Results", results);
  }

  private static native synchronized void
    natCalc(Parameters parameters, Results results);      
}

Now, the calculateNormally method, which blocks the main program, works fine, but the calculateInBackground method, which just constructs a background thread to do the same thing, causes various crashes in the native code when it's invoked consecutively. By consecutively I mean that it's called again only after the previous thread has finished and returned the result. Note that the native code is marked synchronized to ensure that only one instance of it can be running at a time.

My question is, how on earth can the native code behave differently depending on whether it's invoked from the main thread, or from some other thread? It's like the native code were keeping "state", and not really quitting, when it's called from within a thread other than the main thread. Is there a way to "clean" or "flush" a thread after it's finished? There must be something in JNI & Threads that I simply don't know.

Thanks for any hints!

+1  A: 

I figured out a working solution, after googling and finding the phrase "I've found JNI to be very buggy when called from seperate threads... So make sure only one thread ever calls your native code!". It seems to be true; the solution is to keep a persistent, "reusable" thread around - I used Executors.newSingleThreadExecutor() - and to call the native code only from that thread. It works.

So the difference from JNI point of view was not between main thread vs. some other thread, but in using different threads in consecutive calls. Note that in the problematic code a new thread was constructed each time. It should work that way, but it doesn't. (And no, I'm not caching JNIEnv pointer.)

Whether it's a JNI bug, bug in the native code, something in the interaction between them and OS or whatever, would be interesting to know. But sometimes you just have no chance to debug 10000+ lines of existing code in detail, however, you're happy to get it to work. Here's working version of the example code, let's call this a workaround:

public class CalculationEngine {
  private CalculationEngine(){}

  private static Parameters parameters;
  private static ExecutorService executor = Executors.newSingleThreadExecutor();

  private static Runnable analysis = new Runnable() {
      public synchronized void run() {
        Results results = new Results();
        natCalc(parameters, results);
        EventBus.publish("Results", results);
      }
  };  

  public static synchronized void
    calculateInBackground(final Parameters parameters) {
      CalculationEngine.parameters = parameters.clone();
      executor.submit(analysis);
  }

  private static native synchronized void
    natCalc(Parameters parameters, Results results);      
}
Joonas Pulakka
Unless you can find specific bugs that apply to your situation, or can identify at a low level what is happening, you shouldn't blame JNI -- regardless of what some random person on the web claims. See http://bugs.sun.com/bugdatabase/
kdgregory
Incidentally, it's very likely that *your* JNI code is holding onto data between calls. Or more likely, exposing data in a way that multiple concurrent threads can change (the fact that it works when running on a single thread points in this direction).
kdgregory
I'll try to submit a bug, although it's tricky because the native code I'm working on is proprietary so I definitely can't send it to Sun, and how do you reproduce something without the code...? Anyway, I'm now one of the random persons on the web who claim that JNI calls might work better when done from a single thread :-)
Joonas Pulakka
@kdgregory: Even in the first example I limited the number of *simultaneous* threads to one (the synchronized keyword in the native declaration is just one of the security measures), because i know that my JNI code definitely isn't thread safe (there are some static/global variables). But it seems that the calling thread needs to be the *same individual thread* every time.
Joonas Pulakka
If you're going to submit a bug report, you have to clearly identify what's failing, with a code example; otherwise it will just get closed. And before you do that (versus look closely at your own code), you might want to take a look at the source code for java.lang.System. See all the native methods? If JNI truly didn't work in a multi-threaded environment, how long do you think it would take for someone to notice those methods failing?
kdgregory
Well, hard to say which to blame. But see the code? I'm not blaming JNI for *multi-threading* problems - I've not even tried multithreading, since I know that my native (legacy) code is not thread-safe. A fact is that my problem was solved by calling the function always from the *same* thread, as opposed to calling it subsequently from different threads.
Joonas Pulakka
So there's something that is dependent on the thread. That something is either in your code or in the JVM. If you want to blame the JVM, rather than your code, that's your choice. Not exactly rational, and quite likely to result in bugs down the road.
kdgregory
It's as easy to say that "it's your code" as it is to blame the JVM. Anyway, if you get 10000+ lines of legacy code to work by modifying the environment slightly - even if you don't quite understand what's going on - it is a rational choice, IMHO. Another option would be to debug it for months and maybe come up with something. Whether it's rational, it's your choice :-)
Joonas Pulakka
...or perhaps I should use the word "pragmatic". Anyway, modified my answer now.
Joonas Pulakka
+2  A: 

My advice on using JNI is DON'T if you can possibly avoid it. The chances are that it will cause stability issues for you. Here are some possible alternatives:

  1. Recode the native library in Java.
  2. Write a wrapper command for the native library in C / C++ / whatever and run it using java.lang.Process and friends
  3. Turn the native library into a daemon and access it using Sockets.
Stephen C
Thanks for your thoughts. Alternative 1. is not practical when you have LOTS of existing, nontrivial native code to interact with. Alternatives 2 and 3 are quite viable - however, then you need to create some kind of protocol to send your stuff back and forth between the native and Java sides. JNI is, in principle, most straightforward if you have just a few function calls. But in practice it seems to have its own peculiarities; it can expose flaws in the existing native code, no matter how carefully you make the JNI layer...
Joonas Pulakka
You are right about needing to define a (private) protocol. And another problem with alternatives 2 and 3 is that the overheads of an interaction are greater; i.e. fork/exec or RPC versus a procedure call. But in spite of this, I'd still go for a non-JNI solution unless there was a show-stopper reason not to.
Stephen C
+1  A: 

While you've got an answer, I don't think too much has been provided as to possible root cause. Here's a few possibilities, but there are others. Note, these apply to Windows.

There's an apartment threaded COM object involved. Apartment threaded COM objects, which are the only type VB can create, can only be used on the thread that creates them.

Security features, like impersonation, are often thread isolated. If the initialization code modified the context of the thread, future calls that expect the context to be in place will fail.

Thread specific memory storage is a technique within some applications to support multi-threadedness (Java also has such a feature).

Jim Rush
Thanks. It's a Windows .dll indeed, and the COM object related explanation makes much sense. I didn't know that.
Joonas Pulakka