views:

112

answers:

3

Lets say I have 4 different values A,B,C,D with sets of identifiers attached.

A={1,2,3,4,5}
B={8,9,4}
C={3,4,5}
D={12,8}

And given set S of identifiers {1,30,3,4,5,12,8} I want it to return C and D. i.e. retrieve all sets from a group of sets for which S is a superset.

Is there any algorithms to perform this task efficiently (Preferably with low memory complexity. Using external device for storing data is not an option) ? A trivial solution would be for each member in the superset S retrieve list of sets that include that member (basically inverted index) and for each returned set check that all of his members are in the superset. Unfortunately because on average the superset will include at least one member for each set there is a significant and unacceptable performance hit with this approach.

I am trying to do this in Java. Set consist of integers and the value they identify is an object. Collection of sets is not static and bound to change during the course of execution. There will be some limit on the set number though. Set size is not limited. But on average it's between 1 and 20.

+3  A: 
  1. Go through each element x in S.
  2. For each set t for which xt, increment a counter—call it tcount—associated with t.
  3. After all that, for each set t for which tcount = | t |, you know that tS.

Application.

After step 2.

Acount = 4,
Bcount = 1,
Ccount = 3,
Dcount = 2.

Step 3 processing.

Acount ≠ |A| (4 ≠ 5) — Reject,
Bcount ≠ |B| (1 ≠ 3) — Reject,
Ccount = |C| (3 = 3) — Accept,
Dcount = |D| (2 = 2) — Accept.

Cirno de Bergerac
This isn't really any different from `for (t : sets) if (S.containsAll(t)) answers.add(t);` which avoids the need to keep an extra array of integers.
Jason Orendorff
A: 

As for Java, I’d use a Hashtable for the lookup table of the elements in S. Then for each element in X, the set you want to test if it’s a subset of S, test if it’s in the lookup table. If all elements of X are also in S, then S is a superset of X.

Gumbo
Never heard of HashSet?
FogleBird
+1  A: 

Note after cgkanchi note: The following algorithm is under the assumption that you don't really use sets but arrays. If that is not the case, you should look for a method which implements intersection of sets and then the problem is trivial. This is about how to implement the notion of intersection using arrays.

  1. Sort all sets using heapsort for in-place sorting O(1) space. It runs in O(nlogn) and soon enough it will pay you back.
  2. For each set L of all sets:

    2.1. j = 0

    2.2. For the i element in L:

    2.2.1. Starting from j element find L[i] in S for which L[i] = S[j] else reject. If L and S and large enough use binary search or interpolation search (for the second one, have a look at your data distibution)

    2.3. Accept

myle
Errr... I thought most set implementations were unordered?
Chinmay Kanchi
Indeed. I assumed that the notion of set was used that to show that he was in need of intersection. If they are real sets, then isn't there any library offering intersection as a method?Formally, you are right. A set is an unordered collection of unique items.
myle