views:

179

answers:

6

Minimum Set Cover is a question where you must find the minimum number of sets needed to cover every element.
For example, imagine that we have a set of X=array(1,2,3,4,5,6) and 5 another set S, where

S[1]=array(1, 4) 
S[2] =array(2, 5) 
S[3] =array(3, 6)
S[4] =array(1, 2, 3) 
S[5] =array(4, 5, 6)

The problem is to find minimum number of sets of S which cover every element of X. So obviously the minimum set cover in our case will be S[4] and S[5] because they cover all the elements.
Does anybody have an idea how to implement this code in PHP. Note, that this is NP-complete so there is no fast algorithm to solve it. Any solution in PHP will be welcomed. And BTW it is not a homework, I need to use this algorithm in my web application in order to generate suggestion list.
Thanks in advance.

Update 1
There are many applications of Set Covering Problem. Some of the interesting ones are:

  1. Construction of Optimal logic circuits
  2. Air-crew Scheduling
  3. Assembly line balancing
  4. Information retrieval
  5. Art Gallery problem
  6. Genome Sequencing
  7. Red-Blue SetCover problem

Update 2
For example, here you can see the working version of the problem I mentioned. Here, even it shows visually the sets. But I need the pure PHP code for that, if somebody has it please be kind to provide us with the working example in PHP. Thanks

Update 3
Finally, I have solved the problem in PHP. My solution based on the algorithm proposed on a very famous book called Introduction to Algorithms, section The set-covering problem. Here how my solution looks like:

$MainSet=array(1, 2, 3, 4, 5, 6, 7);
$SubSet=array(array(1,4), array(2, 5), array(3, 6), array(1, 2, 3), array(4, 5, 6));

$UncoveredElements=$MainSet;
$CoveringSets=array();
while (count($UncoveredElements)>0){
    $S=SubSetS($UncoveredElements, $SubSet);
    if (is_array($S)){
        $UncoveredElements=array_diff($UncoveredElements, $S);
        $CoveringSets[]=$S;
    }else
        break; //finish work
}
echo "Sets that cover MainSet are:";
var_dump($CoveringSets);

//Subset S is chosen that covers as many uncovered elements as possible
function SubSetS($UncoveredElements, $SubSetArr){
    $max=0; $s=array();
    foreach($SubSetArr as $SubSet){
        $intersectArr=array_intersect($UncoveredElements, $SubSet);
        $weight=count($intersectArr);
        if ($weight>$max){
            $max=$weight;
            $s=$SubSet;
        }
    }
    if ($max>0)
        return $s;
    else
        return 0;
}

Any comments and ideas about my solution? For me, it solves my problem, that is what I wanted. But if you suggest any optimization, correction to the code, please fill free.
BTW, thanks a lot for participants of the question for their valuable responses.

Final Update
This Greedy approach for a Set Covering problem does not always guarantee an optimal solution. Consider the following example:
Given: Elements of Main Set = {1, 2, 3, 4, 5, 6}
Now, consider 4 subsets as follows: Subset 1 = {1, 2}, Subset 2 = {3, 4}, Subset 3 = {5, 6}, Subset 4 = {1, 3, 5}.
The Greedy algorithm for the above collection of Subsets gives the Minimum Set Cover as: Minimum Set Cover contains the Subsets= { 1, 2, 3, 4}.
Thus, though the minimum collection of subsets that cover all the elements of the Main set are the first three subsets, we get the solution containing all the 4 Subsets.

A: 

You can see an explanation of the algorithm here and then translate it from pseudo-code to php. Enjoy!

Bogdan Constantinescu
+1  A: 
  • Find one set cover
  • Find a new set cover with less sets, until you are satisfied.

That is, if your first attempt find a set cover with 4 sets, you can stop at 3 in your next attempts. This is already an improvement over checking all possible covers.

Sjoerd
A: 

If you want to have an optimal solution, you'll have to enumerate all possible combinations. Here's PHP code to do so (consider the comments on the site!).

Then, you have to check, which of your combinationws is the one with the lowest number of used elements. I think you can do this by yourself.

phimuemue
I don't need an optimal solution.
Bakhtiyor
A: 

You need to get all possible combinations of S[i] and then search for the minimum number of sets which cover your original set.

// original set
$x = array(1, 2, 3, 4, 5, 6);
$xcount = sizeof($x);

// subsets
$s = array();
$s[] = array(1, 4);
$s[] = array(2, 5);
$s[] = array(3, 6);
$s[] = array(1, 2, 3);
$s[] = array(4, 5, 6);

// indices pointing to subset elements
$i = range(0, sizeof($s) - 1);

// $c will hold all possible combinations of indices
$c = array(array( ));

foreach ($i as $element)
    foreach ($c as $combination)
        array_push($c, array_merge(array($element), $combination));

// given that $c is ordered (fewer to more elements)
// we need to find the first element of $c which
// covers our set
foreach ($c as $line)
{
    $m = array();
    foreach ($line as $element)
        $m = array_merge($m, $s[$element]);

    // check if we have covered our set
    if (sizeof(array_intersect($x, $m)) == $xcount)
    {
        echo 'Solution found:<br />';
        sort($line);
        foreach ($line as $element)
            echo 'S[',($element+1),'] = ',implode(', ',$s[$element]),'<br />';
        die();
    }
}
Anax
+1  A: 

here it is, finding 11 solutions for your given example:

function array_set_cover($n,$t,$all=false){
    $count_n = count($n);
    $count = pow(2,$count_n);

    for($i=1;$i<=$count;$i++){
        $total=array();
        $anzahl=0;
        $k = $i;
        for($j=0;$j<$count_n;$j++){
                        if($k%2){
                $total=array_merge($total,$n[$j]);
                        }
            $anzahl+=($k%2);
            $k=$k>>1;
        }
                $total = array_unique($total);
        if(count(array_diff($t,$total)) < 1){
            $loesung[$i] = $anzahl;
            if(!$all){
                break;
            }
        }
    }

    asort($loesung);

    foreach($loesung as $val=>$anzahl){
        $bit = strrev(decbin($val));
        $total=0;
        $ret_this = array();
        for($j=0;$j<=strlen($bit);$j++){
            if($bit[$j]=='1'){
                $ret_this[] = $j;
            }   
        }
        $ret[]=$ret_this;
    }

    return $ret;
}

// Inputs
$s = array();
$s[] = array(1, 4);
$s[] = array(2, 5);
$s[] = array(3, 6);
$s[] = array(1, 2, 3);
$s[] = array(4, 5, 6);

// Output
$x = array(1, 2, 3, 4, 5, 6);

var_dump(array_set_cover($s,$x)); //returns the first possible solution (fuc*** fast)

var_dump(array_set_cover($s,$x,true)); // returns all possible solution (relatively fast when you think of all the needet calculations)

if you call that function without the third parameter, the first found solution is given back (as an array with the array-keys of $s that were used) - if you set the third parameter to true, ALL solutions are given back (in the same format, order by number of used items, so the first one should be the best). retun value is like this:

array(1) { // one solution
  [0]=>
  array(3) { // three items used
    [0]=>
    int(0) // used key 0  -> array(1, 4)
    [1]=>
    int(1) // used key 1 ...
    [2]=>
    int(2) // used key 2 ...
  }
}
oezi
@oezi. Thanks a lot man and sorry that I made you work on you lunch. Check up my code on the updated section of the question also. I think my bunch of code is a little bit tidy.
Bakhtiyor
@oezi. 11 solutions? How it is 11, when it should find minimum number of sub-sets covering main set.
Bakhtiyor
@Bakhtiyor: as written it's possible to find only one solution very fast (which isn't the best in most cases) or to find all solutions, ordered by the best at first - so you just have to call that function with the thirt parameter set to true and use the first set (which contains the minimum).
oezi
+4  A: 

Bakhtiyor, what you say you need for this problem is a little bit contradictory. You said first that you want a minimum set cover, and pointed out yourself that the problem is NP-hard. The algorithm that you posted is the greedy set cover algorithm. This algorithm finds a pretty small set cover, but not necessarily the minimum set cover. Two people posted an algorithm that searches more thoroughly and does find a minimum set cover, and you said in one comment that you don't need an optimal solution.

Do you need an optimal solution or not? Because, finding the minimum set cover is a very different algorithm problem from finding a fairly small set cover. An algorithm for the former has to be written very carefully so that doesn't take ages for a large input. (By I large input I mean, say, 40 subsets.) An algorithm for the latter is easy and you did a good job with your own code. The one thing that I would change is that before your your loop statement "foreach($SubSetArr as $SubSet)", I would randomize the list of subsets. That gives the algorithm some chance of being optimal for many inputs. (But, there are examples where the minimum set cover does not include the largest subset, so for some inputs this particular greedy algorithm will have no chance of finding the minimum, regardless of order.)

If you really do want the minimum set cover, and not just a pretty good set cover, then you should state the largest input that you want the code to solve completely. That is a crucial detail that affects how fancy the algorithm needs to be for your task.


To respond to what some other people have said: First, it is certainly not necessary to search over all collections of subsets if you want the optimal solution. Second, you shouldn't be looking for "the" algorithm for the problem. The problem has many algorithms, some faster than others, some more complicated than others.

Here is one way that you can start with an algorithm that searches over all collections, and accelerate it to make something much better. I won't supply code since I don't think that the questioner wants such a fancy solution, but I can describe the ideas. You can arrange the search as a branched search: Either the best set cover contains the largest subset A or it does not. (It works well to start with the largest set.) If it does, you can remove the elements of A from the list of elements, remove the elements of A from the elements of the other subsets, and solve the remaining subset cover problem. If it does not, you can still remove A from the list of subsets and solve the remaining subset cover problem.

So far, this is just a way to arrange a search over everything. But, there are several important ways to take shortcuts in this recursive algorithm. As you find solutions, you can keep a record of the best one that you have found so far. This sets a threshold that you have to beat. At each stage you can total the sizes of the largest threshold-1 subsets remaining, and see if they have enough elements to cover the remaining elements. If not, you can give up; you won't beat the current threshold.

In addition, if in this recursive algorithm any element is only covered by one subset, you can throw in that subset for sure whether or not it is the largest one. (In fact, it's a good idea to add this step even to the greedy algorithm that Bakhtiyor coded.)

These two accelerations can each prune away huge branches from the search tree, and they work even better in combination.

There is also a clever data structure for this type of problem due to Don Knuth called "Dancing Links". He proposed it for the exact cover problem, which is a little bit different from this one, but you can adapt it to minimum subset cover and all of the ideas above. Dancing links is a mesh of intersecting linked lists that allows you check subsets in and out of the recursive search without having to copy the entire input.

Greg Kuperberg
@Greg. I need an algorithm that work fast and solve my problem. And since now I think all the code people offered are so complicated. Some of them even offered to get all possible combinations of S, but this is not the good solution, isn't that?
Bakhtiyor
But does "solve" mean the very best set cover, or a good set cover? And for what size of problem? Because, if you have a large input, the very best set cover is a very hard problem.
Greg Kuperberg
I didn't pretend for **best** set, what I wanted is just **good** sub-sets that covers main set.
Bakhtiyor
Then you might as well use the greedy algorithm that you wrote yourself. In my opinion, it answers your own question.
Greg Kuperberg