ansaurus

Question

Answer 1

A:

I would guess that you have a relatively small set of attributes for the Person object (as compared to the number of Person objects you're considering). If you want to reduce traversing the list of Person objects multiple times, you can take a Person, put its attributes into a list of known possible connections and then move on to the next Person. With each successive Person, you see if it is connected to any prior connection. If so, then you add its unique attributes to the possible connections. You should be able to process all Person objects in one pass. It's possible that you'll have some disconnected sets in the results, so it may be worth examining the unconnected Person objects after you've created the first graph.

Bernard Chen 2009-06-08 22:05:40

Answer 2

A:

JPS 2009-06-08 22:14:20

This approach is missing a Step 6 which would then continuously try to merge every multi-collection until a pass was made where nothing merged.

Sugerman 2009-06-09 16:24:53

I have now updated the explanation to show why Sugerman's comment is not needed.

JPS 2009-08-24 11:53:11

Answer 3

A:

while (!people.isEmpty()) {
    Person first = people.get(0);
    people.remove(first);
    Set<Person> set = makeSet(first);
    for (Person person : people) {
     for (Person other : set) {
      if (person.isRelatedTo(other)) {
       set.add(person);
       people.remove(person);
      }
     }
    }
    sets.add(set);
}
for (Set<Person> a : sets) {
    for (Set<Person> b : sets.except(a)) {
     for (Person person : a)
      for (Person other : b) {
       if (person.isRelatedTo(other)) {
        a.addAll(b);
        b.clear();
        sets.remove(b);
        break;
       }
      }
    }
}

Carl Manaster 2009-06-08 22:20:37

Answer 4

+4 A:

To expand on my comment in the original post, you want to create a list of sets where each member of a given set shares at least one attribute with at least one other member of that set.

Naively, this can be solved either by finding all pairs that share an attribute and merging pairs together that have the same partner iteratively. This would be O(N^3) (N^2 for iterating over pairs, and up to N separate sets to determine membership).

You can also think of this problem as determining the connected component of a graph, where every object and every unique attribute value is a node; each object would be connected to each of its attribute values. Setting up that graph would take linear time, and you could determine the connected components in linear time with a breadth or depth first search.

MSN 2009-06-08 22:52:06

+1 for mentioning the linear-time connected components algorithm. http://en.wikipedia.org/wiki/Strongly_connected_component has links to the canonical choices.

Dave 2009-06-09 00:48:42

Thank you very much for the information.

Sugerman 2009-06-09 13:54:39

On second thought, I do not think this approach will work with an undirected graph.

Sugerman 2009-06-09 16:18:23

As part of the traversal you need a way to mark nodes so that you don't visit them again. That takes care of backwards traversal.

MSN 2009-06-09 17:41:04

It's also a bit of overkill since you basically have to do the same work as finding elements in a set.

MSN 2009-06-09 17:41:44

Answer 5

A:

First, is there some inherent hierarchy in identifiers, and do contradicting identifiers of a higher sort cancel out the same identifier of a lower sort? For example, if A and B have the same SSN, B and C have the same DLN, and C and D have the same SSN which does not match A and B's SSN, does that mean that there are two groups or one?

Assuming contradictions don't matter, you are dealing with equivalence classes, as user 57368 (unknown Google) states. For equivalence classes, people often turn to the Union-find structure. As for how to perform these unions, it's not immediately trivial because I assume you don't have the direct link A-B when both A and B have the same SSN. Instead, our sets will consist of two kinds of elements. Each (attribute type, attribute value) = attribute pair is an element. You also have elements corresponding to objects. When you iterate through the list of attributes for an object, perform the union (object, attribute).

One of the important features of the Union-find data structure is that the resulting structure represents the set. It lets you query "What set is A in?" If this is not enough, let us know and we can improve the result.

But the most important feature is that the algorithm has something which resembles constant-time behavior for each union and query operation.

Martin Hock 2009-06-09 23:08:45

ansaurus

tags:

views:

answers:

Union of All Intersecting Sets

related questions