views:

76

answers:

1

I wrote this program to test how long it would take to "solve" the set-cover problem.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using MoreLinq;

namespace SetCover
{
    class Program
    {
        const int maxNumItems = 10000;
        const int numSets = 5000;
        const int maxItemsPerSet = 300;

        static void Main(string[] args)
        {
            var rand = new Random();
            var sets = new List<HashSet<int>>(numSets);
            var cover = new List<HashSet<int>>(numSets);
            var universe = new HashSet<int>();
            HashSet<int> remaining;
            var watch = new Stopwatch();


            Console.Write("Generating sets...");
            for (int i = 0; i < numSets; ++i)
            {
                int numItemsInSet = rand.Next(1, maxItemsPerSet);
                sets.Add(new HashSet<int>());

                for (int j = 0; j < numItemsInSet; ++j)
                {
                    sets[i].Add(rand.Next(maxNumItems));
                }
            }
            Console.WriteLine("Done!");

            Console.Write("Computing universe...");
            foreach (var set in sets)
                foreach (var item in set)
                    universe.Add(item);
            Console.WriteLine("Found {0} items.", universe.Count);

            watch.Start();

            //Console.Write("Removing subsets...");
            //int numSetsRemoved = sets.RemoveAll(subset => sets.Any(superset => subset != superset && subset.IsSubsetOf(superset)));
            //Console.WriteLine("Removed {0} subsets.", numSetsRemoved);


            //Console.Write("Sorting sets...");
            //sets = sets.OrderByDescending(s => s.Count).ToList();
            //Console.WriteLine("{0} elements in largest set.", sets[0].Count);


            Console.WriteLine("Computing cover...");
            remaining = universe.ToHashSet();
            while (remaining.Any())
            {
                Console.Write("  Finding set {0}...", cover.Count + 1);
                var nextSet = sets.MaxBy(s => s.Intersect(remaining).Count());
                remaining.ExceptWith(nextSet);
                cover.Add(nextSet);
                Console.WriteLine("{0} elements remaining.", remaining.Count);
            }
            Console.WriteLine("{0} sets in cover.", cover.Count);

            watch.Stop();

            Console.WriteLine("Computed cover in {0} seconds.", watch.Elapsed.TotalSeconds);

            Console.ReadLine();
        }
    }

    public static class Extensions
    {
        public static HashSet<TValue> Clone<TValue>(this HashSet<TValue> set)
        {
            var tmp = new TValue[set.Count];
            set.CopyTo(tmp, 0);
            return new HashSet<TValue>(tmp);
        }

        public static HashSet<TSource> ToHashSet<TSource>(this IEnumerable<TSource> source)
        {
            return new HashSet<TSource>(source);
        }
    }
}

This is just a greedy sub-optimal solution, but it still took 147 seconds to run. I think however, that this solution should be pretty close to optimal, so it should be good enough for my purposes. How can I speed it up though?

I commented out a few lines because they do more harm than good. Edit: Computing the universe should actually not be apart of the timing... that can be known beforehand.

+1  A: 

I haven't gone deeply into the detail of your code/algorithm, but I'm gonna use some theory to advice you. As henk commented, in order to perform a "good" benchmark you MUST remove all unneeded code and run your program in Release mode with full optimization and from commandline.

Then, remember that you are running managed code: C# (and Java) are designed for interoperability, not for performance, while they are still both good platforms. You should try either to reimplement your code in C++ if you need performance, or, if you wish, try to use Mono with AOT (ahead-of-time compiler): it bursts performance a lot

 mono --aot=full YourProgram.exe

Now more about benchmarks and optimality: have you compared your results with others? Did you run other set-cover algorithms on your same hardware, or can you compare your hardware to others that ran the same algorithm?

And... how close is your solution to optimal? Can you provide [yourself] an estimate? The key is in LINQ, which I hate because you lose control of your code for simplicity of code. What's the complexity of a LINQ? If each LINQ is O(n), your algorithm is O(n^3) but I might suggest you to replace

remaining.Any()

with

remaining.Count > 0

to gain a magnitude of complexity.

Mine are just advices, hope to have been of help

djechelon
`remaining.Any()` only runs 15 times with 1250 elements... I don't think it's a bottleneck, and I'd suspect it's `O(1)` anyway, no? Your suggestions are good though, didn't know about that mono thing. I don't know of any other C# implementations of set-cover or I'd be using them. Porting it to C++ or something is not a problem, but I need to get the theory behind the algo right first. By "optimal" I mean "produces the fewest number of sets", not as in terms of efficiency. I don't know what the optimal sol'n is, because it's NP-complete and would take too darn long to solve.
Mark
I'm pretty sure the most expensive line is `sets.MaxBy(s => s.Intersect(remaining).Count())` because it has to compute the intersection of 2 sets about 5000 times. I'm not sure exactly how costly the `Intersect` method is... at least `O(n log n)` I think.
Mark
Yea, I read the code of Any() and actually it's O(1). I also found on Wikipedia that the greedy algorithm works usually good, being polynomial, so I would now focus on the runtime environment :)
djechelon
Someday :) I've got a million ideas floating around in my head... every now and then I like to tackle what I think will the biggest hurdle w/out actually completing the project... I'm hoping that if I just solve all the tricky stuff beforehand, it'll be smooth sailing...
Mark