views:

182

answers:

3

I am looking for an efficient way to determine if a set is a subset of another set in Matlab or Mathematica.

Example: Set A = [1 2 3 4] Set B = [4 3] Set C = [3 4 1] Set D = [4 3 2 1]

The output should be: Set A

Sets B and C belong to set A because A contains all of their elements, therefore, they can be deleted (the order of elements in a set doesn't matter). Set D has the same elements as set A and since set A precedes set D, I would like to simply keep set A and delete set D.

So there are two essential rules: 1. Delete a set if it is a subset of another set 2. Delete a set if its elements are the same as those of a preceding set

My Matlab code is not very efficient at doing this - it mostly consists of nested loops.

Suggestions are very welcome!

Additional explanation: the issue is that with a large number of sets there will be a very large number of pairwise comparisons.

+1  A: 

I think the question you mean to ask is "given a list of sets, pick out the set that contains all of the others". There are a bunch of edge cases where I don't know what output you would want (e.g. A = { 1, 2 } and B = { 3, 4 }), so you need to clarify a lot.

However, to answer the question you did ask, about set containment, you can use set difference (equivalently complement wrt another set). In Mathematica, this sort of thing:

setA = {1, 2, 3, 4};
setB = {4, 3};
setC = {3, 4, 1};
setD = {4, 3, 2, 1};
Complement[setD, setA] == {}
 True

indicates setD is a subset of setA.

Jonathan
I added some clarifications above. The output should be set A because it contains sets B and C and set D is simple a rearranged set A. Also, the problem is not designing a logical test to check if a set is a subset but to avoid having a very large number of pairwise comparisons. Therefore, my question is more about an efficient way of approaching the problem.
Edward
+4  A: 

You will likely want to take a look at the built-in set operation functions in MATLAB. Why reinvent the wheel if you don't have to? ;)

HINT: The ISMEMBER function may be of particular interest to you.

EDIT:

Here's one way you can approach this problem using nested loops, but setting them up to try and reduce the number of potential iterations. First, we can use the suggestion in Marc's comment to sort the list of sets by their number of elements so that they are arranged largest to smallest:

setList = {[1 2 3 4],...  %# All your sets, stored in one cell array
           [4 3],...
           [3 4 1],...
           [4 3 2 1]};
nSets = numel(setList);                       %# Get the number of sets
setSizes = cellfun(@numel,setList);           %# Get the size of each set
[temp,sortIndex] = sort(setSizes,'descend');  %# Get the sort index
setList = setList(sortIndex);                 %# Sort the sets

Now we can set up our loops to start with the smallest sets at the end of the list and compare them first to the largest sets at the start of the list to increase the odds we will find a superset quickly (i.e. we're banking on larger sets being more likely to contain smaller sets). When a superset is found, we remove the subset from the list and break the inner loop:

for outerLoop = nSets:-1:2
  for innerLoop = 1:(outerLoop-1)
    if all(ismember(setList{outerLoop},setList{innerLoop}))
      setList(outerLoop) = [];
      break;
    end
  end
end

After running the above code, setList will have all sets removed from it that are either subsets or duplicates of other sets preceding them in the list.

In the best case scenario (e.g. the sample data in your question) the inner loop breaks after the first iteration every time, performing only nSets-1 set comparisons using ISMEMBER. In the worst case scenario the inner loop never breaks and it will perform (nSets-1)*nSets/2 set comparisons.

gnovice
Writing a logical test to check if a set is a subset is not the problem really. What I want to avoid is making a very large number of pairwise comparisons between sets. For example, if I keep my sets in a matrix, I would like to apply an operation to the entire matrix to extract rows with correct sets.
Edward
@Edward: I've updated my answer with a complete solution for you to try. Hopefully it is more efficient than what you were trying previously. ;)
gnovice
A: 

Assuming that if no set is a superset of all the supplied sets, you wish to return the empty set. (I.e. if no set is a superset of all sets, return "no thing".)

So, ..., you want to take the union of all the sets, then find the first set in your list with that many elements. This isn't too hard, skipping the reformatting of the input into internal list form... Mathematica:

    topSet[a_List] := Module[{pool, biggest, lenBig, i, lenI},  
        pool = DeleteDuplicates[Flatten[a]];  
        biggest = {}; lenBig = 0;  
        For[i = 1, i  lenBig, lenBig = lenI; biggest = a[[i]]];  
            ];  
        If[lenBig == Length[pool], biggest, {}]  
    ]  

For instance:

    topSet[{{1,2,3,4},{4,3},{3,4,1},{4,3,2,1}}]  
      {1,2,3,4}  
    topSet[{{4, 3, 2, 1}, {1, 2, 3, 4}, {4, 3}, {3, 4, 1}}]  
      {4,3,2,1}  
    topSet[{{1, 2}, {3, 4}}]  
      {}  

As a large test:

    <<Combinatorica`  
    Timing[Short[topSet[Table[RandomSubset[Range[10^3]], {10^3}]]]]  
      {14.64, {}}  

I.e., a set of 1000 randomly selected subsets of the range [1,1000] was analyzed in 14.64 seconds (and, unsurprisingly none of them happened to be a superset of all of them).

Eric Towers

related questions