The standard union/find or Disjoint-set data structure has very good running times (effectively O(1)
) for single threaded cases. However what is it's validity/performance under multi-threaded cases? I think that it is totally valid even without locking or any atomic operations aside from atomic pointer sized writes.
Does anyone see any problems with the following logic?
First I will assume that pointer sized writes are atomic. From that, it's not hard to argue that you can safely run the
find
function in several threads as the only updates that will occur are all sets to the same value. If you allow thefind
function to return the answer that was true when it was called (as opposed to when it returned) it's not hard to argue that manyfind
s and a singleunion
can be run at the same time; the argument for thefind
s doesn't change and theunion
only updates roots and thefind
s never update roots.As for the remaining case (several
union
s), I think that works as well but I'm not so sure of it.
BTW: I'm not requiring that the solution be just as efficient as the single threaded version. (In order to avoid locks/atomic, I'm also willing to discard globally coherent state.)
edit: taking another look, the many-union cases doesn't work because if the side that is not the new root is unioned with something else (also not as a root) then you could cut it off of other side of the second union.
A = find(a) // union 1
B = find(b) // union 1
----
X = find(x) // union 2 (x == a) -> (X == A) -> (X.par aliases A.par)
Y = find(y) // union 2
X.par = Y // union 2
----
A.par = B // union 1
this can be sidestepped with a CAS:
while(!CAS(A == A.par, B)) A = find(A);