views:

138

answers:

3

I got a problem which I do not know how to solve:

I have a set of sets A = {A_1, A_2, ..., A_n} and I have a set B.

The target now is to remove as few elements as possible from B (creating B'), such that, after removing the elements for all 1 <= i <= n, A_i is not a subset of B'.

For example, if we have A_1 = {1,2}, A_2 = {1,3,4}, A_3={2,5}, and B={1,2,3,4,5}, we could e.g. remove 1 and 2 from B (that would yield B'={3,4,5}, which is not a superset of one of the A_i).

Is there an algorithm for determining the (minimal number of) elements to be removed?

+7  A: 

It sounds like you want to remove the minimal hitting set of A from B (this is closely related to the vertex cover problem).

A hitting set for some set-of-sets A is itself a set such that it contains at least one element from each set in A (it "hits" each set). The minimal hitting set is the smallest such hitting set. So, if you have an MHS for your set-of-sets A, you have an element from each set in A. Removing this from B means no set in A can be a subset of B.

All you need to do is calculate the MHS for (A1, A2, ... An), then remove that from B. Unfortunately, finding the MHS is an NP-complete problem. Knowing that though, you have a few options:

  1. If your data set is small, do the obvious brute-force solution
  2. Use a probabilistic algorithm to get a fast, approximate answer (see this PDF)
  3. Run far, far away in the opposite direction
Chris Schmich
A: 

I think you should find the minimum length from these sets and then delete these elements which is in this set.

@davit-datuashvili: Chris' answer is spot on, I suggest you read it.
Moron
A: 

If you just need some approximation, start with the smallest set in A, and remove one element from B. (You could just grab one at random, or check to see which element is in the most sets in A, depending on how accurate, how fast you need)

Now the smallest set in A isn't a subset of B. Move on from there, but check first to see whether or not the sets you're examining are subsets at this point or not.

This reminds me of the vertex covering problem, and I remember some approximation algorithm for that that is similar to this one.

Precision