views:

636

answers:

6

I have a List object being accessed by multiple threads. There is mostly one thread, and in some conditions two threads, that updates the list. There are one to five threads that can read from this list, depending on the number of user requests being processed. The list is not a queue of tasks to perform, it is a list of domain objects that are being retrieved and updated concurrently.

Now there are several ways to make the access to this list thread-safe:
-use synchronized block
-use normal Lock (i.e. read and write ops share same lock)
-use ReadWriteLock
-use one of the new ConcurrentBLABLBA collection classes

My question:
What is the optimal approach to use, given that the cricital sections typically do not contain a lot of operations (mostly just adding/removing/inserting or getting elements from the list)?
Can you recommend another approach, not listed above?

Some constrains
-optimal performance is critical, memory usage not so much
-it must be an ordered list (currently synchronizing on an ArrayList), although not a sorted list (i.e. not sorted using Comparable or Comparator, but according to insertion order)
-the list will is big, containing up to 100000 domain objects, thus using something like CopyOnWriteArrayList not feasible
-the write/update ciritical sections are typically very quick, doing simple add/remove/insert or replace (set)
-the read operations will do primarily a elementAt(index) call most of the time, although some read operations might do a binary search, or indexOf(element)
-no direct iteration over the list is done, though operation like indexOf(..) will traverse list

+1  A: 

What are the reading threads doing? If they're iterating over the list, then you really need to make sure no-one touches the list during the whole of the iteration process, otherwise you could get very odd results.

If you can define precisely what semantics you need, it should be possible to solve the issue - but you may well find that you need to write your own collection type to do it properly and efficiently. Alternatively, CopyOnWriteArrayList may well be good enough - if potentially expensive. Basically, the more you can tie down your requirements, the more efficient it can be.

Jon Skeet
I had a look at the CopyOnWriteArrayList, but that would be too expensive to use. The list can potentially have 100000 elements, and gets updated a lot.
Herman Lintvelt
Okay, in which case you'll need to really work out your semantics carefully. Are these updates doing inserts/deletes, or just replacing elements? If they're adding/deleting, are they doing so at the head/tail of the list? (In which case a linked list could really help you.)
Jon Skeet
+3  A: 

Do you have to use a sequential list? If a map-type structure is more appropriate, you can use a ConcurrentHashMap. With a list, a ReadWriteLock is probably the most effective way.

Edit to reflect OP's edit: Binary search on insertion order? Do you store a timestamp and use that for comparison, in your binary search? If so, you may be able to use the timestamp as the key, and ConcurrentSkipListMap as the container (which maintains key order).

Chris Jester-Young
I like the ConcurrentSkipListMap idea. In 90% of the time the list is sorted according to some timestamp (part of the ID of each domain object), so it is probably worth it optimising for that. Will still think about the other 10%.
Herman Lintvelt
A: 

I second chris that ConcurrentHashMap makes better sense if that will work for you.

anjanb
+1  A: 

I don't know if this is a posible solution for the problem but... it makes sense to me to use a Database manager to hold that huge amount of data and let it manage the transactions

Telcontar
The data is managed on the server side by DB manager and Appserver layer, but we need to somehow display it to the end-user. The managmenet of this list happens on client-side where the data is retrieved.
Herman Lintvelt
+1  A: 

I second Telcontar's suggestion of a database, since they are actually designed for managing this scale of data and negotiating between threads, while in-memory collections are not.

You say that the data is on a database on the server, and the local list on the clients is for the sake of user interface. You shouldn't need to keep all 100000 items on the client at once, or perform such complicated edits on it. It seems to me that what you want on the client is a lightweight cache onto the database.

Write a cache that stores only the current subset of data on the client at once. This client cache does not perform complex multithreaded edits on its own data; instead it feeds all edits through to the server, and listens for updates. When data changes on the server, the client simply forgets and old data and loads it again. Only one designated thread is allowed to read or write the collection itself. This way the client simply mirrors the edits happening on the server, rather than needing complicated edits itself.

Yes, this is quite a complicated solution. The components of it are:

  • A protocol for loading a range of the data, say items 478712 to 478901, rather than the whole thing
  • A protocol for receiving updates about changed data
  • A cache class that stores items by their known index on the server
  • A thread belonging to that cache which communicated with the server. This is the only thread that writes to the collection itself
  • A thread belonging to that cache which processes callbacks when data is retrieved
  • An interface that UI components implement to allow them to recieve data when it has been loaded

At first stab, the bones of this cache might look something like this:

class ServerCacheViewThingy {
    private static final int ACCEPTABLE_SIZE = 500;
    private int viewStart, viewLength;
    final Map<Integer, Record> items
            = new HashMap<Integer, Record>(1000);
    final ConcurrentLinkedQueue<Callback> callbackQueue
            = new ConcurrentLinkedQueue<Callback>();

    public void getRecords (int start, int length, ViewReciever reciever) {
        // remember the current view, to prevent records within
        // this view from being accidentally pruned.
        viewStart = start;
        viewLenght = length;

        // if the selected area is not already loaded, send a request
        // to load that area
        if (!rangeLoaded(start, length))
            addLoadRequest(start, length);

        // add the reciever to the queue, so it will be processed
        // when the data has arrived
        if (reciever != null)
            callbackQueue.add(new Callback(start, length, reciever));
    }

    class Callback {
        int start;
        int length;
        ViewReciever reciever;
        ...
    }

    class EditorThread extends Thread {

        private void prune () {
            if (items.size() <= ACCEPTABLE_SIZE)
                return;
            for (Map.Entry<Integer, Record> entry : items.entrySet()) {
                int position = entry.key();
                // if the position is outside the current view,
                // remove that item from the cache
                ...
            }
        }

        private void markDirty (int from) { ... }

        ....
    }

    class CallbackThread extends Thread {
        public void notifyCallback (Callback callback);
        private void processCallback (Callback) {
            readRecords
        }
    }
}

interface ViewReciever {
    void recieveData (int viewStart, Record[] records);
    void recieveTimeout ();
}

There's a lot of detail you'll have to fill in for yourself, obviously.

Marcus Downing
Thank you for your answer. The up to 100000 items is the "page" of data we are retrieving to the client. The DB can contain billions of entries. What I would consider seriously is your advise on accessing the list with only one thread. This would definitely simplify things.
Herman Lintvelt
A: 

You can use a wrapper that implements synchronization:

import java.util.Collections;
import java.util.ArrayList;

ArrayList list = new ArrayList();
List syncList = Collections.synchronizedList(list);

// make sure you only use syncList for your future calls...

This is an easy solution. I'd try this before resorting to more complicated solutions.