views:

185

answers:

5

I execute around 2000 tests on grid, each test being run as separate task on grid. The tests do have rather big startup time. Total execution takes 500 hours, finishes in less than 10 hours on 60 node SunGridEngine. Runtime of tests vary from 5 minutes to 90 minutes. Combining tests without much intelligence gave some performance gain. I would like to create "tasks" that are approximately equal size. How can I do so?

(what we do now: Sorting all tests and keep adding till the sum of execution time is approximately 5 hours. Looking for some thing better )

+11  A: 

Doing this optimally is NP-complete. This is a variation of the partition problem, which is a special case of the subset sum problem, which itself a special case of the knapsack problem.

In your case you probably don't need an exact solution, so you can probably use some heuristics to get something "good enough" in a reasonable amount of time. See the Methods section of the partition problem page for a description of some approaches.

Laurence Gonsalves
+1  A: 

Your problem sounds a little like a shop scheduling problem. There are all kinds of different sequencing approaches some of which are described here. Sorting in increasing order of processing time, for instance, will minimized the mean waiting time and a whole bunch of other measures. If you elaborate a bit more on the objective, the setup times, the processing time, and any interdependence that would help.

Grembo
+2  A: 

What you are looking for is the Partition problem for k sets.

There is som literature about k=3, called the 3-partition problem. This is NP complete in the strong sense.

There are many heuristics that should give an approximate result quickly.

I suggest you start here: http://en.wikipedia.org/wiki/Partition_problem

Hope this helps.

Moron
+2  A: 

This is a version of the subset-sum problem, and is NP-complete. Your best bet is to employ some subset-sum heuristics.

BlueRaja - Danny Pflughoeft
A: 

Looking at the links Laurence posted I thought I would try whipping something up. The algorithm is to assign the longest test to the shortest task list (repeat until all the tests are assigned). Using your examples and random test times the std deviation was pretty low, less than 2 minutes in running it several times (code in C#, but nothing that wouldn't be trivial to convert):

    private static void BuildJobs()
    {
        PriorityQueue<Task> tasks = new PriorityQueue<Task>();

        //create a task list for each node
        for (int i = 0; i < 60; i++)
        {
            Task t = new Task();
            tasks.Enqueue(t);
        }

        //get the list of tests, in order from longest to shortest
        int[] testList = new int[2000];

        for (int i = 0; i < testList.Length; i++)
        {
            testList[i] = random.Next(5, 90);
        }

        Array.Sort<int>(testList);
        Array.Reverse(testList);

        // add the longest running test to the current shortest task list
        foreach (int time in testList)
        {
            Task t = tasks.Dequeue();
            t.addTest(time);
            tasks.Enqueue(t);
        }

        Debug.WriteLine(CalculateStdDev(tasks));

    }
Dolphin