Having an array of N
sets, what is the fastest way to generate the inclusion graph of these sets?
views:
113answers:
3If you have a set of N sets none of which contain another, all of the same size, you are going to need to do N^2 set comparisons.
Well, let's reason about it in terms of worst case, and for the sake of simplicity assume that you have an efficient set representation, in which you can check for membership in constant time.
To determine if one set X is a subset of Y, you have to do |X| number of membership tests on Y. Takes time linearly proportional to |X|.
So if you have N sets, and want to figure out which sets are subset of which other sets, you would have to do N2 such subset-tests, thus I suppose you end up with a complexity of O(AN2) where A is the number of elements in the largest set.
Even if you could do something smart to decide "X subset Y" and "Y subset X" at the same time, you wouldn't gain more than a factor 2, so the complexity would not improve.
First off, you can show that this graph will contain O(n^2) edges given n sets: consider sets A1, ..., An with each Ak = {1, ..., k}. Then A1 subset A2, A1 subset A3, ..., A1 subset An, A2 subset A3, ..., which is n(n-1)/2 edges in the inclusion graph.
Given that, I can think of a reasonably simple way to tackle this problem. Let Ai maybe-subset Aj iff there is some x in Aj which is not in Ai. Now, Ai subset Aj iff Ai maybe-subset Aj and not Aj maybe-subset Ai.
Now, for each element x divide your sets into two: those that contain x and those that don't. The latter maybe-subset the former. Add the corresponding edges to the maybe-subset graph. Whenever we have a pair of vertices connected in each direction, we know neither vertex is a subset of the other. This algorithm is O(mn^2) for a universe of m elements and n sets.
Et voila!