tags:

views:

113

answers:

3

Having an array of N sets, what is the fastest way to generate the inclusion graph of these sets?

+1  A: 

If you have a set of N sets none of which contain another, all of the same size, you are going to need to do N^2 set comparisons.

deinst
I believe so. If the sets are all the same size, we can hash, but if they are all nearly the same size, I cannot see anyway around doing O(number of elements) comparisons on average. We can quit as soon as we know that inclusion cannot happen (a mismatch in the smaller set, or too many mismatches in the larger) but that doesn't help.
deinst
How much an inclusion comparison will cost? A element's comparison operation? Does this leave us with an O(N^2*A) algorithm?
banx
I could be wrong. I often am.
deinst
+1  A: 

Well, let's reason about it in terms of worst case, and for the sake of simplicity assume that you have an efficient set representation, in which you can check for membership in constant time.

To determine if one set X is a subset of Y, you have to do |X| number of membership tests on Y. Takes time linearly proportional to |X|.

So if you have N sets, and want to figure out which sets are subset of which other sets, you would have to do N2 such subset-tests, thus I suppose you end up with a complexity of O(AN2) where A is the number of elements in the largest set.

Even if you could do something smart to decide "X subset Y" and "Y subset X" at the same time, you wouldn't gain more than a factor 2, so the complexity would not improve.

aioobe
I guess that the final complexity would be O(N^2*A) where A is the maximum number of elements in the sets.
banx
Of course. I mixed it all up. Updating the answer...
aioobe
+1  A: 

First off, you can show that this graph will contain O(n^2) edges given n sets: consider sets A1, ..., An with each Ak = {1, ..., k}. Then A1 subset A2, A1 subset A3, ..., A1 subset An, A2 subset A3, ..., which is n(n-1)/2 edges in the inclusion graph.

Given that, I can think of a reasonably simple way to tackle this problem. Let Ai maybe-subset Aj iff there is some x in Aj which is not in Ai. Now, Ai subset Aj iff Ai maybe-subset Aj and not Aj maybe-subset Ai.

Now, for each element x divide your sets into two: those that contain x and those that don't. The latter maybe-subset the former. Add the corresponding edges to the maybe-subset graph. Whenever we have a pair of vertices connected in each direction, we know neither vertex is a subset of the other. This algorithm is O(mn^2) for a universe of m elements and n sets.

Et voila!

Rafe