views:

235

answers:

2

The standard union/find or Disjoint-set data structure has very good running times (effectively O(1)) for single threaded cases. However what is it's validity/performance under multi-threaded cases? I think that it is totally valid even without locking or any atomic operations aside from atomic pointer sized writes.

Does anyone see any problems with the following logic?

First I will assume that pointer sized writes are atomic. From that, it's not hard to argue that you can safely run the find function in several threads as the only updates that will occur are all sets to the same value. If you allow the find function to return the answer that was true when it was called (as opposed to when it returned) it's not hard to argue that many finds and a single union can be run at the same time; the argument for the finds doesn't change and the union only updates roots and the finds never update roots.

As for the remaining case (several unions), I think that works as well but I'm not so sure of it.

BTW: I'm not requiring that the solution be just as efficient as the single threaded version. (In order to avoid locks/atomic, I'm also willing to discard globally coherent state.)


edit: taking another look, the many-union cases doesn't work because if the side that is not the new root is unioned with something else (also not as a root) then you could cut it off of other side of the second union.

A = find(a)  // union 1
B = find(b)  // union 1
----
X = find(x)  //   union 2 (x == a) -> (X == A) -> (X.par aliases A.par)
Y = find(y)  //   union 2
X.par = Y    //   union 2
----
A.par = B    // union 1

this can be sidestepped with a CAS:

while(!CAS(A == A.par, B)) A = find(A);
+1  A: 

The type of synchronization which you can use for this structure is similar with the Readers-writers problem. The performance will depend on how many unions will be executed because then the find operation will be stopped.

I'm not sure that you can run many finds and a single union in the same time because the last case from an union operation is not atomic.

Ionel Bratianu
If the non atomic union thing you are referring to is the rank part, yeah, you've got me there. But if you drop that I don't think you will loose much.
BCS
See edit. OTOH this issue is not a problem with the 1-union/many-find case as find will never find something that isn't a parent in the current group and any assignment is valid as long as the assigned value is a parent in the current group.
BCS
Suppose you have 2 elements in the same set and you do finds on them.If at the second find the parent is changed right before reaching the parent the result can be messed.So you will need to lock the sets when you are doing union.This example can be solved in another way but I don't find one better.
Ionel Bratianu
I don't agree: at the start of a find, every node in the tree points to something closer to the root. During the update step, the pointer is updated from something valid to something else that is also valid. (BTW I'm assuming using NULL to denote the root) A find/union would work almost the same.
BCS
A: 

Be aware the assignment operator is not thread safe.

e.g.

x = y;

is not thread safe.

Blank Xavier
could you elaborate? The only requirement I need is that pointer (e.i. register) sized writes never exist in a partly done state where some bits are updated and some are not. I can force alignment if needed.
BCS
It's not partial writes which are the issue. It's race conditions. When a variable (like a pointer) is updated, it is read from memory into a register, modified and written back. Two concurrent threads can race; A reads and modifies, B reads and modifies, A writes back, B writes back.
Blank Xavier
Ah, That's what you are getting at. The neat thing about the algo (and the point I'm wanting reviewed) is that I don't /think/ that matters. No thread will ever attempt to write something that will ever be invalid. At worst, a race will result in a node's pointer be adjusted to something that is sub optimal *but still valid*. Now if the writes were size < size_t or the cache can do dirty writes for the rest of a cache line, that's a different matter.
BCS
I wonder if the code will run properly over multiple cores, in that said cores will see loads and stores retired in different orders.
Blank Xavier