views:

474

answers:

3

I've written a program that counts lines, words, and characters in a text: it does this with threads. It works great sometimes, but not so great other times. What ends up happening is the variables pointing to the number of words and characters counted sometimes come up short and sometimes don't.

It seems to me that the threads are sometimes ending before they can count all the words or characters that they want to. Is it because these threads go out of scope when the while (true) loop breaks?

I've included the code from the thready part of my problem below:

private void countText() {
  try {
    reader = new BufferedReader(new FileReader("this.txt"));
    while (true) {
      final String line = reader.readLine();
      if(line == null) {break;}
      lines++;
      new Thread(new Runnable() {public void run() {chars += characterCounter(line);}}).start();
      new Thread(new Runnable() {public void run() {words += wordCounter(line);}}).start();
      println(line);
    }

  } catch(IOException ex) {return;}

}

(Sub Question: This is the first time I've asked about something and posted code. I don't want to use StackOverflow in place of google and wikipedia and am worried that this isn't an appropriate question? I tried to make the question more general so that I'm not just asking for help with my code... but, is there another website where this kind of question might be more appropriate?)

+2  A: 

Sounds like a good question to me... I think the problem might be related to the atomicity of the chars += and words += - several threads could be calling that at the same time - do you do anything to ensure that there is no interleaving.

That is:

Thread 1, has chars = 10, wants to add 5

Thread 2, has chars = 10, wants to add 3

Thread 1 works out new total, 15

Thread 2 works out new total, 13

Thread 1 sets chars to 15

Thread 2 sets chars to 13.

Might be possible unless you use synchronized when updating those vars.

Chris Kimpton
Aha! You see, I totally learned about interleaving and atomicity and synchronized and locks, but that still would never have occurred to me . That is exactly the problem, no doubt!
Ziggy
Hmm... I used synchronized (this) {around the += stuff} but am still getting unpredictable results...
Ziggy
oh man, I don't think that's it. I added a println(Thread.activeCount()); that would give me a sense of what was going on. It seems like I only sometimes get the full 12 threads active before the while loop ends. That's the problem: not enough time!
Ziggy
My guess is the BufferedReader loads the whole file into memory, if it's a small file, so the actual I/O of reading through the file will be very quick. So you zip through the while loop in a flash and your threads are still starting up by the time you leave the loop.
Sam Stokes
+4  A: 

As Chris Kimpton already pointed out correctly you have a problem with the updating of chars and words in different threads. Synchronizing on this won't work either because this is a reference to the current thread which means different threads will synchronize on different objects. You could use an extra "lock object" you can synchronize on but the easiest way to fix this would probably be to use AtomicIntegers for the 2 counters:

AtomicInteger chars = new AtomicInteger();
...
new Thread(new Runnable() {public void run() { chars.addAndGet(characterCounter(line));}}).start();
...

While this will probably fix your problem, Sam Stoke's more detailed answer is completely right, the original design is very inefficient.

To answer your question about when a thread "goes out of scope": You are starting two new threads for every line in your file and all of them will run until they reach the end of their run() method. This is unless you make them daemon threads)), in that case they'll exit as soon as daemon threads are the only ones still running in this JVM.

WMR
I implemented the AtomicIntegers, and that increased my success rate. There are still runs in which both counts are lower than they should be...
Ziggy
You're probably not waiting for all the threads to complete before printing the result. See my answer below.
Sam Stokes
+7  A: 

A different threaded design would make it easier to find and fix this kind of problem, and be more efficient into the bargain. This is a longish response, but the summary is "if you're doing threads in Java, check out java.util.concurrent as soon as humanly possible)".

I guess you're multithreading this code to learn threads rather than to speed up counting words, but that's a very inefficient way to use threads. You're creating two threads per line - two thousand threads for a thousand line file. Creating a thread (in modern JVMs) uses operating system resources and is generally fairly expensive. When two - let alone two thousand - threads have to access a shared resource (such as your chars and words counters), the resulting memory contention also hurts performance.

Making the counter variables synchronized as Chris Kimpton suggests or Atomic as WMR suggests will probably fix the code, but it will also make the effect of contention much worse. I'm pretty sure it will go slower than a single-threaded algorithm.

I suggest having just one long-lived thread which looks after chars, and one for words, each with a work queue to which you submit jobs each time you want to add a new number. This way only one thread is writing to each variable, and if you make changes to the design it'll be more obvious who's responsible for what. It'll also be faster because there's no memory contention and you're not creating hundreds of threads in a tight loop.

It's also important, once you've read all the lines in the file, to wait for all the threads to finish before you actually print out the values of the counters, otherwise you lose the updates from threads that haven't finished yet. With your current design you'd have to build up a big list of threads you created, and run through it at the end checking that they're all dead. With a queue-and-worker-thread design you can just tell each thread to drain its queue and then wait until it's done.

Java (from 1.5 and up) makes this kind of design very easy to implement: check out java.util.concurrent.Executors.newSingleThreadExecutor. It also makes it easy to add more concurrency later on (assuming proper locking etc), as you can just switch to a thread pool rather than a single thread.

Sam Stokes
I haven't been waiting for the threads to finish. You're right, I'm just doing this to get a hang of the methods I will use with threads: the assignment didn't require threads at all. How do you wait for a thread to finish? Could I just wait for Thread.activeCount() to return a smallish number?
Ziggy
Thread.join() waits for a single thread to die. Waiting for the thread count to equal 1 might work - I suspect you might run into race conditions with threads that are in the process of starting up, but I'm not sure.
Sam Stokes
If you want to get the hang of threads, I do recommend looking into the Executor / thread pool / work queue way of doing things. Once you get your head around it, it's actually a lot easier to reason about than creating threads manually.
Sam Stokes
OK! I'll look into before I try again (this counting down to 1 thing worked. Actually counting down to 4 worked... for some reason! Anyway, it worked, on to the next prob... erm, program!).
Ziggy