ansaurus

Question

Algorithm to find the maximum sum in a sequence of overlapping intervals

Answer 1

A:

Maybe an approach like in this answer could be used, which is O(n) at least for that problem. It would mean to iterate once through the intervals and keep track of just those interval combinations that still could lead to an optimal final solution.

sth 2010-07-14 03:49:50

Answer 2

A:

Sounds like a variation on the Knapsack problem. You might find some inspiration in searching for those solutions.

How many intervals are we talking about? If it's only about 5 (as in your example), it' probably more practical to just try every combination. If it's more, will an approximation of an ideal solution do? Again, Knapsack solutions (such as George Dantzig's greedy approximation algorithm) might be a good place to start.

Damovisa 2010-07-14 03:52:21

@Damovisa: The input sets are very large.

efficiencyIsBliss 2010-07-14 03:59:11

@Dharmesh - that's a shame!

Damovisa 2010-07-14 04:02:29

Answer 3

A:

First of all, I think the maximum is 59, not 55. If you choose intervals [0-5],[8-21], and [25,30], you get 15+19+25=59. You can use some sort of dynamic programming to handle this.

First, you sort all the intervals by their starting point, then iterate from end to start. For each item in list, you choose the maximum sum from that point to the last as max(S[i]+S[j], S[i+1]), where i is the item you are on, j is the item that is the first non-overlapping entry following your item (that is, the first item whose start is larger than the current item's end). To speed up the algorithm, you want to store the maximum partial sum S[j] for each element.

To clarify, let me solve your example according to this. First, sort your intervals:

 1:  0- 5 -  15
 2:  4- 9 -  18
 3:  8-21 -  19
 4: 10-15 -  12
 5: 25-30 -  25

So,

 S[5] = 25
 S[4] = max(12+S[5], 25)=37
 S[3] = max(19+S[5], S[4])=max(19+25,37)=44
 S[2] = max(18+S[4], S[3])=max(18+37,44)=55
 S[1] = max(15+S[3], S[2])=max(15+44, 55)=59

This is an adaptation of the algorithm in this post, but unfortunately, doesn't have the nice O(n) running time. A degenerate list where each entry overlaps the next would cause it to be O(n^2).

Dysaster 2010-07-14 07:55:32

@Dysaster: Yes, the total was wrong. It should be 59.

efficiencyIsBliss 2010-07-14 15:41:20

Answer 4

+3 A:

This is a weighted variation of interval scheduling; it's solvable in O(N log N) with dynamic programming.

Let an interval be g(start, stop, score), and let them be sorted by stop. For simplicity, let's assume for now that all stop is unique.

Let best[i] be the best score we can get when we're allowed to use g[1], ..., g[i]. We don't have to use them all, of course, and generally we can't because the subset of intervals we use must be non-overlapping.

Clearly best[0] = 0. That is, since we can't use any interval, the best score we can get is 0.
For any 1 <= k <= N, we have:
- best[k] = max( best[k-1], best[j] + g[k].score ), where
  - j is the largest index such that g[j].stop < g[k].start (j may be zero)

That is, given that we're allowed to use g[1], ... g[k], the best we can do is the better scoring of these two options:

We do not include g[k]. Thus, the score of this option is best[k-1].
- ... because that's the best we can do with g[1], ... g[k-1]
We include g[k], and to its left we do the best we can with all the genes that don't overlap with g[k], i.e. all g[1], ..., g[j], where g[j].stop < g[k].start and j is as large as possible. Thus, the score of this option is best[j] + g[k].score.

(Note the optimal substructure and overlapping subproblems components of dynamic programming embodied in the above equation).

The overall answer to the question is best[N], i.e. the best score we can get when we're allowed to use all the genes. Oops, did I say genes? I mean intervals.

This is O(N log N) because:

Sorting all the intervals takes O(N log N)
Finding j for each k is O(log N) using binary sort

If several genes can have the same stop values, then nothing changed: you still have to search for the rightmost j. In e.g. Python this is easy with bisect_right. In Java where the standard library binary search doesn't guarantee which index is returned in case of ties, you can (among many options) follow it with a linear search (for O(N) worst-case performance), or another series of binary searches to find the right most index.

Oops did I say genes again? I mean intervals.

Related questions

Extension of binary search to find the first and last index of the key value

polygenelubricants 2010-07-14 09:58:39

And yes, I passed the bot. Wink-wink.

polygenelubricants 2010-07-14 10:09:13

@polygenelubricants: Figures. Thanks!

efficiencyIsBliss 2010-07-14 15:44:13

+1: Seems right (even with negative scores). Not sure why I was thinking O(n^2) is the best known for this. Will delete my answer.

Moron 2010-07-14 16:38:27

Answer 5

A:

I thought of this a bit and came up with something.

Interval Trees provide an efficient way of finding all the intervals that overlap a given interval. Walking through the entire set of intervals, we can find all the overlapping intervals for a given one. Once we have these, we can find the interval with the highest score, store it and move on.

Building the tree takes O(N Log N) time and lookup takes O(Log N) time. Because we do a lookup for all elements, the solution becomes O(N Log N).

However, if we face something like the example above where the highest score interval in one group reduces the total, the algorithm fails because we have no way of knowing that the highest score interval should not be used before hand. The obvious way around this would be to calculate both (or all) totals in case we are not sure, but that puts us back to a potentially O(N^2) or worse solution.

efficiencyIsBliss 2010-07-14 15:39:06

ansaurus

tags:

views:

answers:

Algorithm to find the maximum sum in a sequence of overlapping intervals

Related questions

related questions