views:

653

answers:

4

I have a Java program that runs many small simulations. It runs a genetic algorithm, where each fitness function is a simulation using parameters on each chromosome. Each one takes maybe 10 or so seconds if run by itself, and I want to run a pretty big population size (say 100?). I can't start the next round of simulations until the previous one has finished. I have access to a machine with a whack of processors in it and I'm wondering if I need to do anything to make the simulations run in parallel. I've never written anything explicitly for multicore processors before and I understand it's a daunting task.

So this is what I would like to know: To what extent and how well does the JVM parallel-ize? I have read that it creates low level threads, but how smart is it? How efficient is it? Would my program run faster if I made each simulation a thread? I know this is a huge topic, but could you point me towards some introductory literature concerning parallel processing and Java?

Thanks very much!

Update: Ok, I've implemented an ExecutorService and made my small simulations implement Runnable and have run() methods. Instead of writing this:

Simulator sim = new Simulator(args); 
sim.play(); 
return sim.getResults();

I write this in my constructor:

ExecutorService executor = Executors.newFixedThreadPool(32);

And then each time I want to add a new simulation to the pool, I run this:

RunnableSimulator rsim = new RunnableSimulator(args); 
exectuor.exectue(rsim); 
return rsim.getResults();

The RunnableSimulator::run() method calls the Simulator::play() method, neither have arguments.

I think I am getting thread interference, because now the simulations error out. By error out I mean that variables hold values that they really shouldn't. No code from within the simulation was changed, and before the simulation ran perfectly over many many different arguments. The sim works like this: each turn it's given a game-piece and loops through all the location on the game board. It checks to see if the location given is valid, and if so, commits the piece, and measures that board's goodness. Now, obviously invalid locations are being passed to the commit method, resulting in index out of bounds errors all over the place.

Each simulation is its own object right? Based on the code above? I can pass the exact same set of arguments to the RunnableSimulator and Simulator classes and the runnable version will throw exceptions. What do you think might cause this and what can I do to prevent it? Can I provide some code samples in a new question to help?

+10  A: 

Java Concurrency Tutorial

If you're just spawning a bunch of stuff off to different threads, and it isn't going to be talking back and forth between different threads, it isn't too hard; just write each in a Runnable and pass them off to an ExecutorService.

You should skim the whole tutorial, but for this particular task, start here.

Basically, you do something like this:

ExecutorService executorService = Executors.newFixedThreadPool(n);

where n is the number of things you want running at once (usually the number of CPUs). Each of your tasks should be an object that implements Runnable, and you then execute it on your ExecutorService:

executorService.execute(new SimulationTask(parameters...));

Executors.newFixedThreadPool(n) will start up n threads, and execute will insert the tasks into a queue that feeds to those threads. When a task finishes, the thread it was running on is no longer busy, and the next task in the queue will start running on it. Execute won't block; it will just put the task into the queue and move on to the next one.

The thing to be careful of is that you really AREN'T sharing any mutable state between tasks. Your task classes shouldn't depend on anything mutable that will be shared among them (i.e. static data). There are ways to deal with shared mutable state (locking), but if you can avoid the problem entirely it will be a lot easier.

EDIT: Reading your edits to your question, it looks like you really want something a little different. Instead of implementing Runnable, implement Callable. Your call() method should be pretty much the same as your current run(), except it should return getResults();. Then, submit() it to your ExecutorService. You will get a Future in return, which you can use to test if the simulation is done, and, when it is, get your results.

Adam Jaskiewicz
Thanks good sir. So when I read low level threads, I should think of things beyond my concern and control? If I want any multithreading, I need to implement it my self?
hornairs
Added a bit more clarification about what is going on. It is multithreading underneath, but you don't deal with it directly. You just say "these are independent tasks; run them at the same time".
Adam Jaskiewicz
Check the API. http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ExecutorService.htmYou write "Runnables" that do your processing, then you can call myResult = executorService.invokeAll( yourListOfRunnables) and get a list of results when everything is done.Make sure that the Runnables do not use static variables etc.results = executor.invokeAll
KarlP
+2  A: 

You can also see the new fork join framework by Doug Lea. One of the best book on the subject is certainly Java Concurrency in Practice. I would strong recommend you to take a look at the fork join model.

dfa
A: 

If you are doing full-out processing all the time in your threads, you won't benefit from having more threads than processors. If your threads occasionally wait on each other or on the system, then Java scales well up to thousands of threads.

I wrote an app that discovered a class B network (65,000) in a few minutes by pinging each node, and each ping had retries with an increasing delay. When I put each ping on a separate thread (this was before NIO, I could probably improve it now), I could run to about 4000 threads in windows before things started getting flaky. Linux the number was nearer 1000 (Never figured out why).

No matter what language or toolkit you use, if your data interacts, you will have to pay some attention to those areas where it does. Java uses a Synchronized keyword to prevent two threads from accessing a section at the same time. If you write your Java in a more functional manner (making all your members final) you can run without synchronization, but it can be--well let's just say solving problems takes a different approach that way.

Java has other tools to manage units of independent work, look in the "Concurrent" package for more information.

Bill K
"Linux the number was nearer 1000 (Never figured out why)" - I think I had that same issue once and found out that the Linux kernel has a (configurable) maximum number of OS threads, with a default value of 1024.
Michael Borgwardt
A: 

Java is pretty good at parallel processing, but there are two caveats:

  • Java threads are relatively heavyweight (compared with e.g. Erlang), so don't start creating them in the hundreds or thousands. Each thread gets its own stack memory (default: 256KB) and you could run out of memory, among other things.
  • If you run on a very powerful machine (especially with a lot of CPUs and a large amount of RAM), then the VM's default settings (especially concerning GC) may result in suboptimal performance and you may have to spend some times tuning them via command line options. Unfortunately, this is not a simple task and requires a lot of knowledge.
Michael Borgwardt