views:

2392

answers:

12

For example , given

A = [1,51,3,1,100,199,3], maxSum = 51 + 1 + 199 = 251.

clearly max(oddIndexSum,evenIndexSum) does not work.

The main problem I have is that I can't come up with a selection criterion for an element. A rejection criterion is trivial given a selection criterion.

The standard maximum sub-sequence algorithm doesn't seem to be applicable here. I have tried a dynamic programming approach, but can't come up with that either. The only approach I could come up with was one that used a genetic algorithm.

So how would you approach such a problem? Any help would be appreciated, thanks!

ps: This is not a homework problem, a friend asked me this as we're prepping for interviews.

A: 

max(oddIndexSum,evenIndexSum) does not work

For the example you gave, it does - however, if you have something like: A = [1, 51, 3, 2, 41, 23, 20], you can have 51 + 2 + 23 = 76, or you can have 51 + 41 + 20 = 112, which is clearly larger, and avoids adjacent elements as well. Is this what you're looking for?

Andy Mikula
I should have said max(oddIndexSum,evenIndexSum) does not work in all cases. My bad ...
sundeep
A: 

While you used a bunch of fancy words, isn't this basically just a plain old graph problem of the travelling salesman?

Except in this case you are looking for the most expensive route through the (dense) graph? In this case the vertices are just the numbers themselves, the edges are not directed and have no weight, and all vertices are connected, except to the vertices that had been adjacent to them in the original list?

Joe
@Joe: 1) Yes , it looks like the problem can be reduced to the specialized TSP problem. Thanks !2) You pointing out that I was using "a bunch of fancy words" showed me that I was trying to appear smarted than I am (subconsciously , i promise!) . Thanks again !
sundeep
This isn't the same as this specialised TSP. With [5, 6, 7, 8, 9], the travelling salesman could go 5-8-6-9; but you can't choose 5, 6, 8, 9 as a subsequence. The nodes are ordered in this problem, but they aren't for the TSP.
Bennett McElwee
@Joe: No, it's not TSP -- this problem can be solved in linear time, while TSP is > polynomial. -- MarkusQ
MarkusQ
It is not a TSP. For example, @sth's answer shows O(N**2) in time and O(N) in space algorithm.
J.F. Sebastian
D'oh! Thanks everyone - I did indeed miss a few key elements. Shouldn't be in such a hurry. :) I'm surpised it could be done in O(N) - I'll go look at that.
Joe
No, J.F. Sabastian, do your O() calculations again. There are several algorithms with O(N) time and constant (or O(N)) space posted here,
MarkusQ
@MarkusQ: See my comment to @sth's answer where I've explained why it is not O(N).
J.F. Sebastian
@J.F.Sebastian I saw it. Appending isn't intrinsically O(n) -- you're making assumptions about the underlying language implementation. It could just as well be a cons, which is O(1). If you're going to be assuming such things, what's to stop you from assuming x+y is O(x) (e.g. done with incs)?
MarkusQ
A: 
while you still have elements
     find the largest element, add it to the sum
     remove the element before and after the current
Muad'Dib
This fails on a trivial case: [2, 3, 2].
Bennett McElwee
+7  A: 

You can build the maximal subsequence step by step if you keep two states:

def maxsubseq(seq):
  # maximal sequence including the previous item
  incl = []
  # maximal sequence not including the previous item
  excl = []

  for i in seq:
    # current max excluding i
    if sum(incl) > sum(excl):
      excl_new = incl
    else:
      excl_new = excl

    # current max including i
    incl = excl + [i]

    excl = excl_new

  if sum(incl) > sum(excl):
    return incl
  else:
    return excl


print maxsubseq([1,4,6,3,5,7,32,2,34,34,5])

If you also want to have negative elements in your lists, you have to add a few ifs.

Same -- in lesser lines

def maxsubseq2(iterable):
    incl = [] # maximal sequence including the previous item
    excl = [] # maximal sequence not including the previous item

    for x in iterable:
        # current max excluding x
        excl_new = incl if sum(incl) > sum(excl) else excl
        # current max including x
        incl = excl + [x]
        excl = excl_new

    return incl if sum(incl) > sum(excl) else excl

Same -- eliminating sum()

def maxsubseq3(iterable):
    incl = [] # maximal sequence including the previous item
    excl = [] # maximal sequence not including the previous item
    incl_sum, excl_sum = 0, 0
    for x in iterable:
        # current max excluding x
        if incl_sum > excl_sum:
            # swap incl, excl
            incl, excl = excl, incl
            incl_sum, excl_sum = excl_sum, incl_sum
        else:
            # copy excl to incl
            incl_sum = excl_sum #NOTE: assume `x` is immutable
            incl     = excl[:]  #NOTE: O(N) operation
        assert incl is not excl
        # current max including x
        incl.append(x)
        incl_sum += x
    return incl if incl_sum > excl_sum else excl

Allright, let's optimize it...

Version with total runtime O(n):

def maxsubseq4(iterable):
    incl = [] # maximal sequence including the previous item
    excl = [] # maximal sequence not including the previous item
    prefix = [] # common prefix of both sequences
    incl_sum, excl_sum = 0, 0
    for x in iterable:
        if incl_sum >= excl_sum:
            # excl <-> incl
            excl, incl = incl, excl
            excl_sum, incl_sum = incl_sum, excl_sum
        else:
            # excl is the best start for both variants
            prefix.extend(excl) # O(n) in total over all iterations
            excl = []
            incl = []
            incl_sum = excl_sum
        incl.append(x)
        incl_sum += x
    best = incl if incl_sum > excl_sum else excl
    return prefix + best # O(n) once
sth
This algorithm is O(N**2). Loop over `seq` O(N) times (excl + [x]) which is O(N) -> O(N)*O(N) -> O(N**2).
J.F. Sebastian
`maxsubseq4()` looks like O(N).
J.F. Sebastian
why not `prefix.extend(incl if incl_sum > excl_sum else excl)`? It requires at least twice as less time and memory as `(prefix + best)` variant.
J.F. Sebastian
Just (cons best prefix) and be done with it. O(1), since you're counting.
MarkusQ
+6  A: 

Chris's answer fails on the list [9,10,9], producing 10 instead of 9+9 = 18.

Joe is not quite right. Traveling salesman requires you to visit every city, whereas there's no analog to that here.

One possible solution would be the recursive solution:

function Max_route(A)
    if A's length = 1 
        A[0]
      else
        maximum of
          A[0]+Max_route(A[2...])
          Max_route[1...]

This has the same big-O as a naive fibonacci function, and should yield to some of the same optimizations (e.g. memoization) if you care about efficiency in addition to simply getting a correct answer.

-- MarkusQ

[Edit] ---

Because some people don't seem to be getting this, I want to explain what I meant by memoization and why it matters.

You could wrap the function above so that it only computed the value for each array once (the first time it was called), and on subsequent calls would simply return the saved result. This would take O(n) space but would return in constant time. That means the whole algorithm would return in O(n) time, better than the exponential time of the less cluttered version above. I was assuming this was well understood.

[Second edit]------------------------------

If we expand the above a bit and tease it apart we get:

f []      :- [],0
f [x]     :- [x],x
f [a,b]   :- if a > b then [a],a else [b],b
f [a,b,t] :- 
    ft = f t
    fbt = f [b|t]
    if a + ft.sum > fbt.sum
        [a|ft.path],a+ft.sum
      else
        fbt

Which we can unroll into a pseudo basic using only size n arrays of integers and booleans, and the operations of 1) array indexing and indexed array assignment, 2) integer math, including comparison, 3) if/then/else, and 4) one single loop of O(n):

dim max_sum_for_initial[n],next_to_get_max_of_initial[n],use_last_of_initial[n]

max_sum_for_initial[0] = 0
next_to_get_max_of_initial[0] = -1
use_last_of_initial[0] = false

max_sum_for_initial[1] = a[0]
next_to_get_max_of_initial[1] = -1
use_last_of_initial[1] = true

if a[0] > a[1]
    max_sum_for_initial[2] = a[0]
    next_to_get_max_of_initial[2] = 0
    use_last_of_initial[2] = false
  else
    max_sum_for_initial[2] = a[1]
    next_to_get_max_of_initial[1] = -1
    use_last_of_initial[2] = true

for i from 3 to n
    if a[i]+max_sum_for_initial[i-2] > max_sum_for_initial[i-1]
        max_sum_for_initial[i] = a[i]+max_sum_for_initial[i-2]
        next_to_get_max_of_initial[i] = i-2
        use_last_of_initial[i] = true
      else
        max_sum_for_initial[i] = max+sum_for_initial[i-1]
        next_to_get_max_of_initial[i] = i-1
        use_last_of_initial[i] = false

At the end we can extract the results (in reverse order):

for i = n; i >= 0; i = next_to_get_max_of_initial[i]
    if use_last_of_initial[i] then print a[i]

Note that what we just did manually is something that a good compiler for a modern language should be able to accomplish with tail recursion, memoization, etc.

I hope that is clear enough.

-- MarkusQ

It's O(n).

MarkusQ
What is the linear algorithm you mentioned in comments? The fastest algorithm so far is @sth's one which is O(N**2) in time and O(N) in space. http://stackoverflow.com/questions/584228/algorithm-to-find-the-maximum-subsequence-of-an-array-of-positive-numbers-catch/584343#584343
J.F. Sebastian
@sth's is linear in time and space, as is this one with smart memoization.
MarkusQ
Your algorithm is not linear. A naive recursive Fibonacci function and a more efficient iterative one use different algorithms. Could you give a link to an answer that contains O(N) algorithm (and not just references that it could be done in O(N)).
J.F. Sebastian
@sth updated his answer with O(N) algorithm.
J.F. Sebastian
J.F. Sebastian -- The algorithm above is linear with memoization (which requires O(n) space). I didn't write it out longhand because a) many modern languages include facilities for it, b) it would clutter and obscure the code.
MarkusQ
I've tried to come up with smart memoization for your algorithm (that would be O(N) in time) and failed. I'm not smart enough apparently :) Take a look @mattiast's answer. His memoization technique (dynamic programming) is suitable only for sum of max subsequence. It doesn't work for *subsequence*.
J.F. Sebastian
The crucial part is: `A[i]+B[i-2]` It works for scalar, but for arrays it is O(N) operation. Sooner or later you'll be forced to use some analog of `prefix` array from sth's answer and at this moment your memoization technique should become very smart.
J.F. Sebastian
Dynamic programming works if max subsequence for [0..m) is a subsequence of max subsequence for [0..n) range, where m < n; and we know how to construct solution for [0..n) knowing [0..m) solution. I'm not sure that it is the case here. But I could be mistaken (it is late here).
J.F. Sebastian
@J.F.Sebastian -- The crucial part is that adding an element to a list with a lispish (cons item list) style operation is constant time (just like adding scalars). It is not O(n).
MarkusQ
@Markus@: There is no question that OP question can be answered in O(N) time. @sth demonstrated that. The question is: could we call it a smart memoization of initial recursive algorithm. I don't know whether your last algorithm (w/ boolean array) works (probably it does), but it is not memoization.
J.F. Sebastian
I'll make 2nd attempt at writing memoization using linked lists this time instead of arrays.
J.F. Sebastian
@J.F. Sebastian -- All I did was unroll the algorithm from pseudo functional code into an Algol-family low-level to help you see what was going on. If you go through it, you'll see that the middle version is just an expansion of the first, and the later corresponds chunk by chunk to the middle one.
MarkusQ
P.S. I've added a new question based on this discussion, if you're interested. http://stackoverflow.com/questions/589489/how-much-is-it-fair-to-assume-about-implementation-when-doing-big-o-calculations
MarkusQ
I've posted implementations in Emacs Lisp http://stackoverflow.com/questions/584228/algorithm-to-find-the-maximum-subsequence-of-an-array-of-positive-numbers-catch/586386#586386
J.F. Sebastian
You don't need to memoize anything; the algorithm itself is in linear time.
MSN
@MarkusQ: Note an exact words you use do matter. In your question you use 'list', in your answer you use 'array'. The default assumptions for these two words are different. For example, 'array' associated with O(1) random access if it is not stated otherwise.
J.F. Sebastian
+1  A: 

A recursive answer in strange Prologesque pseudocode:

maxSum([]) = 0
maxSum([x]) = x
maxSum(A) = max(A[0] + maxSum(A[2..n]),
                A[1] + maxSum(A[3..n]))

With appropriate handling of out-of-range indexes.

Edit: This reduces to MarcusQ's nicer answer:

maxSum([]) = 0
maxSum(A) = max(A[0] + maxSum(A[2..n]), maxSum(A[1..n]))

Edit: Here's a version that returns the actual subsequence rather than just its sum. It stretches the limits of my ad hoc pseudo-Prolog-C Chimera, so I'll stop now.

maxSub([]) = []
maxSub(A) = sub1 = [A[0]] + maxSub(A[2..n])
            sub2 = maxSub(A[1..n])
            return sum(sub1) > sum(sub2) ? sub1 : sub2
Bennett McElwee
OP asks about max *subsequence*, not just merely its sum.
J.F. Sebastian
Yes, the title of the question says that, but I inferred from the actual question that the OP was interested only in the sum. Extending this algorithm to return the subsequence as well as its sum is trivial, but doesn't look quite so nice.
Bennett McElwee
If `[]` doesn't not denote a linked list in pseudo-Prolog-C then `[A[0]] + maxSub(A[2..n])` is O(N) operation if it involves creating a new subsequence (if it doesn't then the alg won't work).
J.F. Sebastian
*`doesn't denote`
J.F. Sebastian
Everything is meant to have value semantics, so [] + [] returns a new array (list, whatever). It might be apparent that efficiency is not a consideration to the pseudo-Prolog-C pseudocoder. I was trying for a short and clear presentation of the algorithm, but maybe I didn't succeed.
Bennett McElwee
I prefer to communicate algorithms in executable pseudo-code, but I might be just weird that way. Only 1 (excluding mine) other answer from 10 contains executable examples for the question.
J.F. Sebastian
A: 

To avoid recursion, we can take from reverse than forward,

ie) for Array A[1..n]->

     maxSum(A,n): for all n

         if n=0, maxSum = 0 else
         if n=1, maxSum=A[1] else
                maxSum = max(A[n] + maxSum(A,n-2), maxSum(A,n-1))

To avoid computing of Max(A,n-2), while expanding maxSum(A,n-1), it can be stored and computed. That is why I ask to reverse. ie) maxSum(A,n-1) = max(A[n-1]+ maxSum(A,n-3), maxSum(A,n-2) ) where in Max(A,n-2) is already got, and no need to recalculate ) In otherwords compute maxSum(A,n) for all n starting from 1 to n using above formula to avoid recomputing.

ie) n=2, maxSum = max(A[1]+maxSum(A,0), maxSum(A,1) ) ie) n=3, maxSum = max(A[2]+maxSum(A,2), maxSum(A,2) ) and so on .. and reach last n. this will be o(n).

lakshmanaraj
This takes exponential time when a linear time algorithm is possible.
Jay Conrod
It is a pseudo code we can store MaxSum(A,n-2) before computing maxsum(A,n-1) and we can take to O(n)
lakshmanaraj
+1  A: 

@MarkusQ's answer as a Python oneliner (modified as @recursive suggested in the comments):

f = lambda L: L and max([L[0]] + f(L[2:]), f(L[1:]), key=sum)

Example:

>>> f([1,51,3,1,100,199,3])
[51, 1, 199]

It is inefficient, but It might be used to test faster solutions.

Same -- in Emacs Lisp

(defun maxsubseq (L)
  "Based on MarkusQ's and sth's answers."
  (if (not L) L
    (let ((incl (cons (car L) (maxsubseq (cddr L))))
          (excl (maxsubseq (cdr L))))
      (if (> (sum incl) (sum excl)) incl excl))))
(defun sum (L) (apply '+ L))

Iterative version (O(N) if tail-recursion is available)

It is based on @sth's answer:

(defun maxsubseq-iter-impl (L excl incl)
  (let ((next (if (> (car excl) (car incl)) excl incl)) (x (car L)))
    (if (not L) (cdr next)
      (maxsubseq-iter-impl (cdr L) next
                           (cons (+ x (car excl)) (cons x (cdr excl)))))))

(defun maxsubseq-iter (L) (reverse (maxsubseq-iter-impl L '(0) '(0))))

Example:

(require 'cl)
(loop for f in '(maxsubseq maxsubseq-iter) 
      collect (loop for L in '((1 51 3 1 100 199 3) (9 10 9)) 
      collect (f L)))

Output:

(((51 1 199) (9 9)) ((51 1 199) (9 9)))
J.F. Sebastian
slightly shorter f = lambda L: L and max([L[0]] + f(L[2:]), f(L[1:]), key=sum)
recursive
@recursive: Thanks. I've updated the answer.
J.F. Sebastian
A: 

We can use an auxiliary array B[0..n-1], where B[i] is the maximum sum of the elements A[0..i] and C[0..n-1], where C[i] is boolean telling if A[i] is in the maximum sum subsequence:

B[0]=max(A[0],0); C[0]=(A[0]>0)
B[1]=max(B[0],A[1]); C[1]=(A[1]>B[0])
for i in [2..n-1]
  if A[i]+B[i-2] > B[i-1]
      C[i]=True
      B[i]=A[i]+B[i-2]
  else
      C[i]=False
      B[i]=B[i-1]
mssq=[]
i=n-1
while i>=0
  if C[i]
    push(A[i],mssq)
    i=i-2
  else
    i=i-1
return mssq

This clearly works in O(n) time and space. Actually, this is the same as MarcusQ's solution, only reversed and optimized.

mattiast
The question is about maximum *subsequence*, not its sum. It is not a trivial change if you'd like to stay in O(N).
J.F. Sebastian
Fixed. Now this should also work with negative elements.
mattiast
A: 

Edit: This is really a dupe of sth's, but I didn't realize it until after I posted it.

You can do this in constant space and linear time, assuming you don't need to keep track of which items contribute to the final sum.

Pseudocode:

sum_excluded_last_item= 0
sum_included_last_item= 0

for each item in list
    if (item>0)
        last_sum_excluded_last_item= sum_excluded_last_item
        sum_excluded_last_item= max(sum_included_last_item, sum_excluded_last_item + item)
        sum_included_last_item= last_sum_excluded_last_item + item
    else
        sum_excluded_last_item= max(sum_excluded_last_item, sum_included_last_item)
        sum_included_last_item= sum_excluded_last_item

max_sum= max(sum_excluded_last_item, sum_included_last_item)

Getting the actual list is an exercise left up to the reader. Or me if you add more comments. But it should be obvious from the algorithm.

MSN
A: 

MarkusQ's code appears to completely skip over a[2]. I'm not smart enough to figure out where it should figure into the reckoning.

+1  A: 

Here is an answer done using dynamic programming using the same base concept as that used by MarkusQ. I am just calculating the sum, not the actual sequence, which can produced by a simple modification to this code sample. I am surprised nobody mentioned this yet, because dynamic programming seems a better approach rather than recursion + memoization !

int maxSeqSum(int *arr, int size) {
  int i, a, b, c;
  b = arr[0];
  a = (arr[1] > arr[0]) ? arr[1]: arr[0];
  for(i=2;i<size;i++) {
    c = (a > (b + arr[i]))? a : (b + arr[i]);
    b = a;
    a = c;
  }
  return a;
}
vinutheraj