views:

218

answers:

6

Under what circumstances would an unsynchronized collection, say an ArrayList, cause a problem? I can't think of any, can someone please give me an example where an ArrayList causes a problem and a Vector solves it? I wrote a program that have 2 threads both modifying an arraylist that has one element. One thread puts "bbb" into the arraylist while the other puts "aaa" into the arraylist. I don't really see an instance where the string is half modified, I am on the right track here?

Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?

Many Thanks in advance.

+1  A: 

When will it cause trouble?

Anytime that a thread is reading the ArrayList and the other one is writing, or when they are both writing. Here's a very known example.

Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?

Yes, Single core cpus can execute only one instruction at a time (not really, pipelining has been here for a while, but as a professor once said, thats "free" parallelism). Even though, each process running in your computer is only executed for a period of time, then it goes to an idle state. In that moment, another process may start/continue its execution. And then go into an idle state or finish. Processes execution are interleaved.

With threads the same thing happens, only that they are contained inside a process. How they execute is dependant on the Operating System, but the concept remains the same. They change from active to idle constantly through their lifetime.

Tom
+2  A: 

I don't really see an instance where the string is half modified, I am on the right track here?

That won't happen. However, what could happen is that only one of the strings gets added. Or that an exception occurs during the call to add.

can someone please give me an example where an ArrayList causes a problem and a Vector solves it?

If you want to access a collection from multiple threads, you need to synchronize this access. However, just using a Vector does not really solve the problem. You will not get the issues described above, but the following pattern will still not work:

 // broken, even though vector is "thread-safe"
 if (vector.isEmpty())
    vector.add(1);

The Vector itself will not get corrupted, but that does not mean that it cannot get into states that your business logic would not want to have. You need to synchronize in your application code (and then there is no need to use Vector).

synchronized(list){
   if (list.isEmpty())
     list.add(1);
}

The concurrency utility packages also has a number of collections that provide atomic operations necessary for thread-safe queues and such.

Thilo
+12  A: 

A practical example. At the end list should contain 40 items, but for me it usually shows between 30 and 35. Guess why?

static class ListTester implements Runnable {
    private List<Integer> a;

    public ListTester(List<Integer> a) {
        this.a = a;
    }

    public void run() {
        try {
            for (int i = 0; i < 20; ++i) {
                a.add(i);
                Thread.sleep(10);
            }
        } catch (InterruptedException e) {
        }
    }
}


public static void main(String[] args) throws Exception {
    ArrayList<Integer> a = new ArrayList<Integer>();

    Thread t1 = new Thread(new ListTester(a));
    Thread t2 = new Thread(new ListTester(a));

    t1.start();
    t2.start();
    t1.join();
    t2.join();
    System.out.println(a.size());
    for (int i = 0; i < a.size(); ++i) {
        System.out.println(i + "  " + a.get(i));
    }
}

edit
There're more comprehensive explanations around (for example, Stephen C's post), but I'll make a little comment since mfukar asked. (should've done it right away, when posting answer)

This is the famous problem of incrementing integer from two different threads. There's a nice explanation in Sun's Java tutorial on concurrency. Only in that example they have --i and ++i and we have ++size twice. (++size is part of ArrayList#add implementation.)

Nikita Rybak
+1 nice example. race-condition errors are usually not able to reproduce and such examples are great help for novice
Hemal Pandya
Why? A fair example, but your answer is incomplete.
Michael Foukarakis
@mfukar Thanks, I just posted it and went to sleep :) Now link added.
Nikita Rybak
+1  A: 

The first part of youe query has been already answered. I will try to answer the second part :

Also, I remember that I was told that multiple threads are not really running simultaneously, 1 thread is run for sometime and another thread runs after that(on computers with a single CPU). If that was correct, how could two threads ever access the same data at the same time? Maybe thread 1 will be stopped in the middle of modifying something and thread 2 will be started?

in wait-notify framework, the thread aquiring the lock on the object releases it when waiting on some condition. A great example is the producer-consumer problem. See here: link text

Amit
+7  A: 

There are three aspects of what might go wrong if you use an ArrayList (for example) without adequate synchronization.

The first scenario is that if two threads happen to update the ArrayList at the same time, then it may get corrupted. For instance, the logic of appending to a list goes something like this:

public void add(T element) {
    if (!haveSpace(size + 1)) {
        expand(size + 1);
    }
    elements[size] = element;
    // HERE
    size++;
}

Now suppose that we have one processor / core and two threads executing this code on the same list at the "same time". Suppose that the first thread gets to the point labeled HERE and is preempted. The second thread comes along, and overwrites the slot in elements that the first thread just updated with its own element, and then increments size. When the first thread finally gets control, it updates size. The end result is that we've added the second thread's element and not the first thread's element, and most likely also added a null to the list. (This is just illustrative. In reality, the native code compiler may have reordered the code, and so on. But the point is that bad things can happen if updates happen simultaneously.)

The second scenario arises due to the caching of main memory contents in the CPU's cache memory. Suppose that we have two threads, one adding elements to the list and the second one reading the list's size. When on thread adds an element, it will update the list's size attribute. However, since size is not volatile, the new value of size may not immediately be written out to main memory. Instead, it could sit in the cache until a synchronization point where the Java memory model requires that cached writes get flushed. In the meantime, the second thread could call size() on the list and get a stale value of size. In the worst case, the second thread (calling get(int) for example) might see inconsistent values of size and the elements array, resulting in unexpected exceptions. (Note that kind of problem can happen even when there is only one core and no memory caching. The JIT compiler is free to use CPU registers to cache memory contents, and those registers don't get flushed / refreshed with respect to their memory locations when a thread context switch occurs.)

The third scenario arises when you synchronize operations on the ArrayList; e.g. by wrapping it as a SynchronizedList.

    List list = Collections.synchronizedList(new ArrayList());

    // Thread 1
    List list2 = ...
    for (Object element : list2) {
        list.add(element);
    }

    // Thread 2
    List list3 = ...
    for (Object element : list) {
        list3.add(element);
    }

If thread2's list is an ArrayList or LinkedList and the two threads run simultaneously, thread 2 will fail with a ConcurrentModificationException. If it is some other (home brew) list, then the results are unpredictable. The problem is that making list a synchronized list is NOT SUFFICIENT to make it thread-safe with respect to a sequence of list operations performed by different threads. To get that, the application would typically need to synchronize at a higher level / coarser grain.

Stephen C
A: 

You cannot control when one thread will be stopped and other will start. Thread 1 will not wait until it has completely finished adding data. There is always possible to corrupt data.

fastcodejava