ansaurus

Question

Best Data Structure for this Multithreaded Use Case: Is Intrusive List good?

Answer 1

A:

A concurrent binary tree might be a good fit, allowing reads and writes to different intervals to proceed in parallel.

Dave 2009-05-30 02:38:04

+1 Agree. This feels like a one dimensional bsp.

Magnus Skog 2009-05-30 08:11:42

Answer 2

+4 A:

You have 2 different issues here:

How to represent your data structure
How to make it thread safe, in an efficient manor

Your data structure will be doing (20-80) x (2-8) reads for every write.

(1). First, lets assume your range is a data structure as follows

    struct Interval
    {
        Interval(int start, int length)
          : m_start(start),
            m_length(length)
        {}
        int m_start;
        int m_length;
        int value; // Or whatever
    };

Since reads massively outnumber writes, lookup needs to be fast, while modifications don't.

Using a list for your data structure means O(N) lookups and O(1) modification - exactly the wrong way around.

The simplest possible representation of your structure is a vector. If the intervals are held in sorted order, lookups are O(logN) and modifications are O(N).

To implement this, just add a comparator to Interval:

bool operator<(const Interval& rhs) const
{
    return m_start < rhs.m_start;
}

You can then use std::lower_bound to find the first interval equal or lower to your search interval in O(logN).

Next and previous interval are O(1) - decrement or increment the returned iterator.

Splitting an interval means inserting a new element after the current one and adjusting the length of the current - O(N).

Joining two intervals means adding the length of the next one to the current one and erasing the next one - O(N).

You should reserve() enough space in the vector for the maximum number of elements to minimise resizing overhead.

(2). Following Knuth, 'premature optimisation is the root of all evil'.

A single read/write lock on the structure holding your vector<Interval> will more than likely be sufficient. The only possible problems are (2a) writer starvation because readers monopolise the lock, or (2b) reader starvation because the writers updates take too long.

(2a) If (and only if) you face writer starvation, you could make the locking more granular. Its extremely likely that this won't be the case though. To do this:

Make your vector hold its intervals by pointer, not value. This is so the resizes do not move your objects around in memory. Have each interval contain a read/write lock.

For reads: Take the read lock of the collection, then of the interval you want. If you don't need to read any other intervals, give up the collection lock as soon as you have acquired the interval lock, to allow other threads to proceed.

If you need to read other buckets, you can read-lock them in any order until you give up the collection read lock, at which point the writer may add or remove any intervals you havent locked. Order doesn't matter while acquiring these locks since the writer cannot be changing the vector while you hold the a read lock on the collection, and read locks do not contend.

For writes:

Take the write lock of the collection, then of the interval you want. Note that you must hold the collection write lock for the entirety of of any updates that will add or remove intervals. You can give up the collection lock if you are only updating one interval. Otherwise, you need to hold the write lock and acquire a write lock on any intervals you will be modifying. You can acquire the interval locks in any order, since no readers can acquire new read locks without the collection lock.

The above works by being more selfish to the writer thread, which should eliminate starvation.

(2b) If you face reader starvation, which is even more unlikely, the best solution is to split the collection writes and reads apart. Hold the collection by shared pointer, and have a single write lock on it.

For reads: Take the write lock and a copy of the shared_ptr. Give up the write lock. The reader can now read the collection without any locks (its immutable).

For writes: Take a shared_ptr to the collection as per the reader, giving up the lock. Make a private copy of the collection and modify it (no locks required since its a private copy). Take the write lock again and replace the existing shared_ptr with your new collection. The last thread to finish with the old collection will destroy it. All future threads will use the newly updated collection.

Note that this algorithm only works with one writer, as per your problem description.

Jon 2009-05-30 13:25:36

Thanks for the detailed answer. I would have given +5, if it were allowed.

Amit Kumar 2009-06-01 07:49:04

ansaurus

tags:

views:

answers:

Best Data Structure for this Multithreaded Use Case: Is Intrusive List good?

related questions