views:

134

answers:

3

hi all, I have 2 pools of threads

ioThreads = (ThreadPoolExecutor)Executors.newCachedThreadPool();

cpuThreads = (ThreadPoolExecutor)Executors.newFixedThreadPool(numCpus);

I have a simple web crawler that I want to create an iothread, pass it a url, it will then fetch the url and pass the contents over to a cpuThread to be processed and the ioThread will then fetch another url, etc...

At some point the IO thread will not have any new pages to crawl and I want to update my database that this session is complete. How can I best tell when the threads are all done processing and the program can be ended?

+2  A: 

A typical way would be to use common (volatile or synchronized) boolean flag(s) to communicate between the threads. When the IO thread is finished, it flips the flag. The other thread checks the flag value in a loop, and when it sees the changed value, it exits the loop and terminates.

If you use the producer-consumer model with a work queue between the IO threads and the processing threads, another possibility would be to pass a special "end of processing" token to the queue, which would signal to the processors that they can terminate.

Péter Török
how does the thread actually terminate?
Just `return` from `run()` method.
BalusC
Pretty pointless. The other thread could just call join() with a 1ms timout in a loop, no need for booleans at all.
EJP
@EJP, call `join()` on what? With a producer-consumer model, the different threads don`t even see each other... :-/
Péter Török
I am referring to the first part of your suggestion 'use common (volatile or synchronized) boolean flag(s) to communicate between'. You don't need booleans for that, just join().
EJP
@EJP, in that part, you may be right. Years ago I learnt in a Java multithreaded course that `join()` is unreliable, that's why I suggested boolean flags. But that may have changed since, as "Concurrent Programming in Java" mentions `join()` as a valid solution (although very briefly). Still, I wouldn't dare to say `join()` is all you ever need :-)
Péter Török
I didn't say it was all you ever need. I said you don't need a boolean instead of join(). And I am very suspicious of unsourced allegations about 'join() is unreliable'. I've been using Java for over 13 years without noticing that.
EJP
@EJP, sorry, it should have been "that may have changed since, _or it may have been incorrect_ ". And well, if "no need for booleans at all" does not mean that (in your opinion) `join()` can solve all such problems, then what is it supposed to mean? :-)
Péter Török
EJP
+1  A: 

You can wait for all threads to finish using CyclicBarrier for example http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/CyclicBarrier.html

Anton
A: 

Lay out the program logic. Store URLs in a Stack object (stacks are thread-safe).

if

a. there are no more URLs on the stack
b. no more crawler threads running
c. no more CPU/processing threads running

Then program can write to DB and exit.

matiasf