views:

151

answers:

3

Today, I had discussion with someone about Kruskal Minimum Spanning Tree algorithm because of page 13 of this slide.

The author of the presentation said that if we implement disjoint sets using (doubly) linked list, the performance for Make and Find will be O(1) and O(1) respectively. The time for operation Union(u,v) is min(nu,nv), where nu and nv are the sizes of the sets storing u and v.

I said that we can improve the time for the Union(u,v) to be O(1) by making the representation pointer of each member pointing a locator that contains the pointer to the real representation of the set.

In Java, the data structure would look like this :

class DisjointSet {
    LinkedList<Vertex> list = new LinkedList<Vertex>(); // for holding the members, we might need it for print

    static Member makeSet(Vertex v) {
        Member m = new Member();

        DisjointSet set = new DisjointSet();
        m.set = set;
        set.list.add(m);

        m.vertex = v;

        Locator loc = new Locator();
        loc.representation = m;
        m.locator = loc;

        return m;
    }
}

class Member {
    DisjointSet set;
    Locator locator;
    Vertex vertex;

    Member find() {
        return locator.representation;
    }

    void union(Member u, Member v) { // assume nv is less than nu
        u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)
        v.set = u.set;
        v.locator.representation = u.locator.representation;
    }

}

class Locator {
    Member representation;
}

Sorry for the minimalistic code. If it can be made this way, than running time for every disjoint set operation (Make,Find,Union) will be O(1). But the one whom I had discussion with can't see the improvement. I would like to know your opinion on this.

And also what is the fastest performance of Find/Union in various implementations? I'm not an expert in data structure, but by quick browsing on the internet I found out there is no constant time data structure or algorithm to do this.

+1  A: 

Don't you also have to update the locator for the other elements in the same set as v? Where you are doing that?

Based on your comments, even if you are using pointer for locators, i believe you intended v.locator.representation = u.locator.representation

Consider the following case.

First you do A = B union C.

Now you do E = D Union A, will it work? What happens to the representation value of elements which were originally in C now?

Moron
All elements in a set point to a locator, and that locator has a pointer to the real representation (for example the first element). That's why we only need to update the locator's pointer.
monn
@monn:You are doing v.locator = u.locator. Don't you also have to do that for other elements in the same set as v?
Moron
@monn: Even if locator was a pointer and you intended `v.locator.representation = u.locator.representation` it is wrong after you a a bunch of unions.
Moron
@monn: Are you convinced now? Check out the edit to my answer about multiple unions...
Moron
+2  A: 

My intuition agrees with your colleague. You say:

u.set.list.append(v.set.list); // hypothetical method, append a list in O(1)

It looks like your intent is that the union is done via the append. But, to implement Union, you would have to remove duplicates for the result to be a set. So I can see an O(1) algorithm for a fixed set size, for example...

Int32 set1;
Int32 set2;

Int32 unionSets1And2 = set1 | set2;

But that strikes me as cheating. If you're doing this for general cases of N, I don't see how you avoid some form of iterating (or hash lookup). And that would make it O(n) (or at best O(log n)).

FYI: I had a hard time following your code. In makeSet, you construct a local Locator that never escapes the function. It doesn't look like it does anything. And it's not clear what your intent is in the append. Might want to edit and elaborate on your approach.

Rob
The context for this disjoint set data structure is Kruskal's MST (as you can see on the slide). We grow forests initially by making each vertex as a forest. Then, there is no duplicate.
monn
Ah, I forgot to set it. I've edited the code.I do the append only for the sake of printing it later when it's needed. For the Kruskal's MST, I think we don't need to have the list.
monn
+2  A: 

Using Tarjan's version of the Union-Find structure (with path compression and rank-weighed union), a sequence of m Finds and (n-1) intermixed Unions would be in O(m.α(m,n)), where α(m,n) is the inverse of Ackermann function which for all practical values of m and n has value 4. So this basically means that Union-Find has worst case amortized constant operations, but not quite.

To my knowledge, it is impossible to obtain a better theoretical complexity, though improvements have led to better practical efficiency.

For special cases of disjoint-sets such as those used in language theory, it has been shown that linear (i.e., everything in O(1)) adaptations are possible---essentially by grouping nodes together---but these improvements cannot be translated to the general problem. On the other hand of the spectrum, a somewhat similar core idea has been used with great success and ingenuity to make an O(n) algorithm for minimum spanning tree (Chazelle's algorithm).

So your code cannot be correct. The error is what Moron pointed out: when you make the union of two sets, you only update the "representation" of the lead of each list, but not of all other elements---while simultaneously assuming in the find function that every element directly knows its representation.

Jérémie
I only need to update it because all elements refer to it. And it is a pointer to the representation.
monn