Algorithm: What is the fastest way to check for set inclusion?

views:

113

answers:

+1 Q:

Algorithm: What is the fastest way to check for set inclusion?

Having an array of N sets, what is the fastest way to generate the inclusion graph of these sets?

+1 A:

If you have a set of N sets none of which contain another, all of the same size, you are going to need to do N^2 set comparisons.

deinst 2010-08-12 12:23:18

I believe so. If the sets are all the same size, we can hash, but if they are all nearly the same size, I cannot see anyway around doing O(number of elements) comparisons on average. We can quit as soon as we know that inclusion cannot happen (a mismatch in the smaller set, or too many mismatches in the larger) but that doesn't help.

deinst 2010-08-12 12:36:42

How much an inclusion comparison will cost? A element's comparison operation? Does this leave us with an O(N^2*A) algorithm?

banx 2010-08-12 12:37:07

I could be wrong. I often am.

deinst 2010-08-12 12:37:09

+1 A:

Well, let's reason about it in terms of worst case, and for the sake of simplicity assume that you have an efficient set representation, in which you can check for membership in constant time.

To determine if one set X is a subset of Y, you have to do |X| number of membership tests on Y. Takes time linearly proportional to |X|.

So if you have N sets, and want to figure out which sets are subset of which other sets, you would have to do N² such subset-tests, thus I suppose you end up with a complexity of O(AN²) where A is the number of elements in the largest set.

Even if you could do something smart to decide "X subset Y" and "Y subset X" at the same time, you wouldn't gain more than a factor 2, so the complexity would not improve.

aioobe 2010-08-12 12:30:42

I guess that the final complexity would be O(N^2*A) where A is the maximum number of elements in the sets.

banx 2010-08-12 13:31:41

Of course. I mixed it all up. Updating the answer...

aioobe 2010-08-12 13:48:16

+1 A:

First off, you can show that this graph will contain O(n^2) edges given n sets: consider sets A1, ..., An with each Ak = {1, ..., k}. Then A1 subset A2, A1 subset A3, ..., A1 subset An, A2 subset A3, ..., which is n(n-1)/2 edges in the inclusion graph.

Given that, I can think of a reasonably simple way to tackle this problem. Let Ai maybe-subset Aj iff there is some x in Aj which is not in Ai. Now, Ai subset Aj iff Ai maybe-subset Aj and not Aj maybe-subset Ai.

Now, for each element x divide your sets into two: those that contain x and those that don't. The latter maybe-subset the former. Add the corresponding edges to the maybe-subset graph. Whenever we have a pair of vertices connected in each direction, we know neither vertex is a subset of the other. This algorithm is O(mn^2) for a universe of m elements and n sets.

Et voila!

Rafe 2010-08-13 03:08:07

ansaurus

tags:

views:

answers:

Algorithm: What is the fastest way to check for set inclusion?

related questions