tags:

views:

237

answers:

5

I am too dense to solve the following optimization problem:

There is a 2D array, let's say symbols vs time, for example

A 1114334221111 
B 9952111111111
C 1113439111131
D 1255432245662

There is also a list of symbols, for example:

CABDC

You must choose values from the array in order of the symbols, but you can repeat a symbol as many times as you like. You must choose at least one value for each symbol, and you must go all the way through the list. For example, one possibility could be:

  CCCAAAAAABDDC
  1114334221661 = 35

Is there an algorithm to choose the list of symbols which sum to the maximum value? On first blush it looks like some kind of backtracking algorithm, but that could degenerate to exponential time.

A: 

You could go for a greedy algorithm, with some restrictions. You'll always have to chose between the current or the next symbol. If the equivalent for the current symbol is bigger then the next one (and you will be able to put all the remaining symbols later), then use current symbol. If the next symbol number is bigger, then pick next.If they're the same, you'll need some extra logic, to decide.

Samuel Carrijo
+2  A: 

This looks to me like a slight variation on shortest path. Consider each element of your array to be nodes in a digraph. You should be able to modify any of the standard shortest path algorithms to add the symbol ordering.

Fragsworth
+4  A: 

The approach I'd probably take would look like this:

Create an array ("X") of (N+1)xM elements, where N = the "width" of your 2D array and M = the number of symbols in your list.

Set X[0][i] = 0 for all values i (in the range of M). This is because the maximum possible sum of the first 0 chosen items is 0. We can also fill in X[a][b] as 0 for any b>=a since those positions are ones that we couldn't possible get to (we wouldn't have picked enough symbols to be at that symbol yet). Similarly, we can set a "triangle" of values at the tail end of the array to 0 as well, since in order to be at one of those indices we'd have to pick too many repetitions of earlier symbols to be able to pick at least one item for later symbols.

Now, set X[1][i] to be equal to the maximum possible sum of the first 1 chosen elements if the current symbol being chosen is symbol i in the sequence. This is fairly trivial to compute - it's just the corresponding value for that symbol from the array.

Now, set X[2][i] to be equal to the maximum possible sum of the first 2 chosen elements if the current symbol being chosen is symbol i in the sequence. This is also semi-trivial, though not as simple as [1] was - after all, now it's the corresponding value of the symbol plus the maximum of either X[1][i-1] or X[1][i] - because we could either begin the current symbol at this position, or have already begun it at an earlier position.

Continue the algorithm for each X[k] (setting X[k][i] equal to the max of either X[k-1][i-1] or X[k-1][i], plus the corresponding symbol value for the current i) until k reaches N. The maximum value in your X[k] column is your maximum result.

Since overall you only need to look at each array element a constant number of times, the overall algorithm runs in O(MN) time.

(Note that technically you don't need the full NxM array in memory all the time, only the previous column and the current column. It's just easier to explain it in terms of the full array. Thus the optimized version of this algorithm uses O(M) memory.)

Example result array from your example:

       0  1  2  3  4  5  6  7  8  9  10 11 12 13
      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+
0 - C | 0| 1| 2| 3| 6|10|13|22|23|24| 0| 0| 0| 0|
      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+
1 - A | 0| 0| 2| 3| 7|10|13|17|24|26|27| 0| 0| 0|
      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+
2 - B | 0| 0| 0| 7| 9|10|11|14|18|25|26|28| 0| 0|
      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+
3 - D | 0| 0| 0| 0|12|16|19|21|23|27|32|38|44| 0|
      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+
4 - C | 0| 0| 0| 0| 0|16|19|28|29|30|31|33|36|45|
      +--+--+--+--+--+--+--+--+--+--+--+--+--+--+

If you store (at each array position) which previous symbol was used as part of that position's total, it's easy to then just read back through the path that traces from the bottom-right corner to the top-left to figure out what the maximal sequence is. Alternatively, you can simply trace back by looking at which of the two values - left or up-left - is the larger from your current position. In this case, the maximal sequence is CABDDDDDDDDDC.

Amber
He says that you are supposed to find a list of symbols, not the number of repetitions of each symbol given a list. The author needs to clarify this, if this is the incorrect description of his problem.
Unknown
You are given a list of symbols, you're supposed to find another list of symbols (generated by expanding one or more symbols from the original list through repetition) that obtains the maximum sum.
Amber
The author doesn't specifically say that though. I'm not saying your interpretation can't be correct, but right now it could also mean that he wants you to generate the original list of symbols.
Unknown
The author seems to be fairly consistent with their language in using "there is ___" to specify things you are provided with, as opposed to things you are supposed to generate. "There is a list of symbols" strongly implies (at least to me) that you are provided with such a list.
Amber
Thanks, seems to work well! I thought it was going to be a more tricky DP problem but I suppose the unidirectional nature of the search makes it linear time.
sehugg
Well, the goal of dynamic programming is to 'design out' backtracking so that you *can* just work through in some kind of unidirectional manner. :) It's pretty much all about how you set up what you're calculating such that you can base future calculations off of previous ones (as you probably know).
Amber
A: 

Is there an algorithm to choose the list of symbols which sum to the maximum value?

This should be easy since you said you can repeat a symbol as many times as you like and you made no restrictions on the number of symbols or that each symbol must be used.

(I suspect this is probably too easy to be the case and that you just didn't describe the whole problem but here is the solution anyway)

for i through the length of array in time
    symbol, value = get_max_value(array[i]) # iterates through every symbol to find max at the given time
    print symbol, value
Unknown
"You must choose at least one value for each symbol, and you must go all the way through the list."
Amber
Dav, see my comment.
Unknown
+2  A: 

you can make it a shortest path problem, but unlike what Fragsworth said, you dont need to change the algorithm, just how you present the data:

each node in your graph is a single cell in your 2D array, and you connect the nodes according to your list, with weight set to 10 - cellValue. you dont connect nodes that doesnt follow the rules (you wont connect b[4] to a[5] because it's not in the list "order")

examples:

c[0] connect to c[1] with weight 9 (10 - c[1] value that is 1) => this is the option to choose "C"

c[0] connect to a[1] with weight 9 (10 - a[1] value that is 1) => this is the option to choose "A"

b[2] connect to b[3] with weight 8 (10 - b[3] value that is 2) => this is the option to choose "B"

b[2] connect to d[3] with weight 5 (10 - d[3] value that is 5) => this is the option to choose "D"

the only problem you got is to get rid of the option that you chose "CCCCC..." all the way, that is countered by refering the second "C" of your "CABDC" list as (C2) and only connect it from the D nodes (or other C2 nodes) in your example.

Now you run a any standard shortest path algorithim (without having to change it) starting from c[0] and ending at c2[n], because the weights are the opposite on the value, the shortest path you get will be the sum of the max values.

amikazmi