views:

189

answers:

5

I have built an application that is used to simulate the number of products that a company can produce in different "modes" per month. This simulation is used to aid in finding the optimal series of modes to run in for a month to best meet the projected sales forecast for the month. This application has been working well, until recently when the plant was modified to run in additional modes. It is now possible to run in 16 modes. For a month with 22 work days this yields 9,364,199,760 possible combinations. This is up from 8 modes in the past that would have yielded a mere 1,560,780 possible combinations. The PC that runs this application is on the old side and cannot handle the number of calculations before an out of memory exception is thrown. In fact the entire application cannot support more than 15 modes because it uses integers to track the number of modes and it exceeds the upper limit for an integer. Baring that issue, I need to do what I can to reduce the memory utilization of the application and optimize this to run as efficiently as possible even if it cannot achieve the stated goal of 16 modes. I was considering writing the data to disk rather than storing the list in memory, but before I take on that overhead, I would like to get people’s opinion on the method to see if there is any room for optimization there.

EDIT Based on a suggestion by few to consider something more academic then merely calculating every possible answer, listed below is a brief explanation of how the optimal run (combination of modes) is chosen. Currently the computer determines every possible way that the plant can run for the number of work days that month. For example 3 Modes for a max of 2 work days would result in the combinations (where the number represents the mode chosen) of (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) For each mode a product produces at a different rate of production, for example in mode 1, product x may produce at 50 units per hour where product y produces at 30 units per hour and product z produces at 0 units per hour. Each combination is then multiplied by work hours and production rates. The run that produces numbers that most closely match the forecasted value for each product for the month is chosen. However, because some months the plant does not meet the forecasted value for a product, the algorithm increases the priority of a product for the next month to ensure that at the end of the year the product has met the forecasted value. Since warehouse space is tight, it is important that products not overproduce too much either.

Thank you

private List<List<int>> _modeIterations = new List<List<int>>();

private void CalculateCombinations(int modes, int workDays, string combinationValues)
    {
        List<int> _tempList = new List<int>();

        if (modes == 1)
        {
            combinationValues += Convert.ToString(workDays);
            string[] _combinations = combinationValues.Split(',');

            foreach (string _number in _combinations)
            {
                _tempList.Add(Convert.ToInt32(_number));
            }
            _modeIterations.Add(_tempList);
        }
        else
        {
            for (int i = workDays + 1; --i >= 0; )
            {
                CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",");
            }
        }
    }
A: 

I would replace the List objects with my own class that uses preallocated arrays to hold the ints. I'm not really sure about this right now, but I believe that each integer in a List is boxed, which means much more memory is used than with a simple array of ints.

Edit: On the other hand it seems I am mistaken: http://stackoverflow.com/questions/1168915/which-one-is-more-effecient-listint-or-int

rslite
Untrue. Generics in C# are decent enough not to box value types. You might be thinking of .NET 1.1 `ArrayList`, or you might be thinking of Java generics. (At least how they worked around 1.4.2) In both cases ints are indeed boxed.
Joren
Yep, if you read my edit to the post you can see I already admitted that.
rslite
+5  A: 

You could process the permutation as soon as you have generated it, instead of collecting them all in a list first:

public delegate void Processor(List<int> args);

private void CalculateCombinations(int modes, int workDays, string combinationValues, Processor processor)
{
    if (modes == 1)
    {
        List<int> _tempList = new List<int>();
        combinationValues += Convert.ToString(workDays);
        string[] _combinations = combinationValues.Split(',');

        foreach (string _number in _combinations)
        {
            _tempList.Add(Convert.ToInt32(_number));
        }
        processor.Invoke(_tempList);
    }
    else
    {
        for (int i = workDays + 1; --i >= 0; )
        {
            CalculateCombinations(modes - 1, workDays - i, combinationValues + i + ",", processor);
        }
    }
}

I am assuming here, that your current pattern of work is something along the lines

CalculateCombinations(initial_value_1, initial_value_2, initial_value_3);

foreach( List<int> list in _modeIterations ) {

    ... process the list ...

}

With the direct-process-approach, this would be

private void ProcessPermutation(List<int> args) 
{
    ... process ...
}

... somewhere else ...

CalculateCombinations(initial_value_1, initial_value_2, initial_value_3, ProcessPermutation);

I would also suggest, that you try to prune the search tree as early as possible; if you can already tell, that certain combinations of the arguments will never yield something, which can be processed, you should catch those already during generation, and avoid the recursion alltogether, if this is possible.

In new versions of C#, generation of the combinations using an iterator (?) function might be usable to retain the original structure of your code. I haven't really used this feature (yield) as of yet, so I cannot comment on it.

Dirk
+1, definitely: if it's possible to tell at the moment of generation whether the current combination is superior or inferior to another combination, you should do that and discard (or remove) the inferior combination before storing it. Or, if you want to end up with *X* combinations, just see if the current combination is inferior to the current top *X* you have stored.
Jeff Sternal
+1, This was a good solution to the question that I asked. It has indeed taken care of the memory issue that I was facing. However, I didn't mark it as the answer because as Eric and Jorge note below, this brute force solution will always have a limit. For the short term, this has allowed me to move on with the issue at hand, thank you for your time.
Irwin M. Fletcher
+2  A: 

The problem lies more in the Brute Force approach that in the code itself. It's possible that brute force might be the only way to approach the problem but I doubt it. Chess, for example, is unresolvable by Brute Force but computers play at it quite well using heuristics to discard the less promising approaches and focusing on good ones. Maybe you should take a similar approach.

On the other hand we need to know how each "mode" is evaluated in order to suggest any heuristics. In your code you're only computing all possible combinations which, anyway, will not scale if the modes go up to 32... even if you store it on disk.

Jorge Córdoba
This is a interesting approach, I have updated my question to reflect the way modes are evaluated.
Irwin M. Fletcher
+1  A: 
if (modes == 1)
{
    List<int> _tempList = new List<int>();
    combinationValues += Convert.ToString(workDays);
    string[] _combinations = combinationValues.Split(',');

    foreach (string _number in _combinations)
    {
        _tempList.Add(Convert.ToInt32(_number));
    }
    processor.Invoke(_tempList);
}

Everything in this block of code is executed over and over again, so no line in that code should make use of memory without freeing it. The most obvious place to avoid memory craziness is to write out combinationValues to disk as it is processed (i.e. use a FileStream, not a string). I think that in general, doing string concatenation the way you are doing here is bad, since every concatenation results in memory sadness. At least use a stringbuilder (See back to basics , which discusses the same issue in terms of C). There may be other places with issues, though. The simplest way to figure out why you are getting an out of memory error may be to use a memory profiler (Download Link from download.microsoft.com).

By the way, my tendency with code like this is to have a global List object that is Clear()ed rather than having a temporary one that is created over and over again.

Brian
+8  A: 

This kind of optimization problem is difficult but extremely well-studied. You should probably read up in the literature on it rather than trying to re-invent the wheel. The keywords you want to look for are "operations research" and "combinatorial optimization problem".

It is well-known in the study of optimization problems that finding the optimal solution to a problem is almost always computationally infeasible as the problem grows large, as you have discovered for yourself. However, it is frequently the case that finding a solution guaranteed to be within a certain percentage of the optimal solution is feasible. You should probably concentrate on finding approximate solutions. After all, your sales targets are already just educated guesses, therefore finding the optimal solution is already going to be impossible; you haven't got complete information.)

What I would do is start by reading the wikipedia page on the Knapsack Problem:

http://en.wikipedia.org/wiki/Knapsack%5Fproblem

This is the problem of "I've got a whole bunch of items of different values and different weights, I can carry 50 pounds in my knapsack, what is the largest possible value I can carry while meeting my weight goal?"

This isn't exactly your problem, but clearly it is related -- you've got a certain amount of "value" to maximize, and a limited number of slots to pack that value into. If you can start to understand how people find near-optimal solutions to the knapsack problem, you can apply that to your specific problem.

Eric Lippert
You are absolutely correct; I need to revisit this approach. I will read up in the areas that you have suggested. Perhaps a question about heuristics is in my future.
Irwin M. Fletcher