views:

122

answers:

3
+1  Q: 

Fast min on span

Given a list of arrays and lots of setup time I need to quickly find the smallest value in some sub-span of each array. In concept:

class SpanThing
{
     int Data;
     SpanThing(int[][] data) /// must be rectangulare
     {
         Data = data;
         //// process, can take a while
     }

     int[] MinsBruteForce(int from, int to)
     {
         int[] result = new int[data.length];
         foreach(int index, int[] dat; Data)
         {
             result[i] = int.max;
             foreach(int v; dat[from .. to]);
                result[i] = min(result[i], v);
         }
         return result;
     }
     int[] MinsSmart(int from, int to)
     {
          // same as MinsBruteForce but faster
     }
}

My current thinking on how to do this would be to build a binary tree over the data where each node contains the min in the related span. That way finding the min in the span for one row would consist of finding the min of only the tree nodes that compose it. This set would be the same for each row so could be computed once.

Does any one see any issues with this idea or known of better ways?


To clarify, the tree I'm talking about would be set up sutch that the root node would contain the min value for the entire row and for each node, it's left child would have the min value for the left half of the parent's span and the same for the right.

 0   5   6  2   7   9   4   1   7   2   8   4   2
 ------------------------------------------------
   | 5 | 6|   | 7 | 9 |   | 1 | 7 | 2 | 8 | 4 | 2  
 0 |   5  | 2 |   7   | 4 |   1   |   2   |   2  
   0      |   2       |   1       |       2
          0           |           1
                      0

This tree can be mapped to an array and defined in such a way that the segment bounds can be computed resulting in fast look-up.

The case I'm optimizing for is where I have a fixed input set and lots of up front time but then need to do many fast tests on a veriety of spans.

+2  A: 

Your described approach sounds like you're trying to do some sort of memoization or caching, but that's only going to help you if you're checking the same spans or nested spans repeatedly.

The general case for min([0..n]) is going to be O(n), which is what you've got already.

Your code seems to care more about the actual numbers in the array than their order in the array. If you're going to be checking these spans repeatedly, is it possible to just sort the data first, which could be a single O(n log n) operation followed by a bunch of O(1) ops? Most languages have some sort of built in sorting algorithm in their standard libraries.

Ryan Graham
A: 

It's not clear how we can represent the hierarchy of intervals efficiently using the tree approach you've described. There are many ways to divide an interval --- do we have to consider every possibility?

Would a simple approach like this suffice: Suppose data is an N x M array. I would create a M x M x N array where entry (i,j,k) gives the "min" of data(k,i:j). The array entries will be populated on demand:

int[] getMins(int from, int to)
{
    assert to >= from;

    if (mins[from][to] == null)
    {
        mins[from][to] = new int[N];

        // populate all entries (from,:,:)
        for (int k = 0; k < N; k++)
        {
            int t = array[k][from];

            for (int j = from; j < M; j++)
            {
                if (array[k][j] < t)
                    t = array[k][j];

                mins[from][j][k] = t;
            }
        }
    }

    return mins[from][to];
}
Zach Scrivena
+2  A: 

Your proposed solution appears to give the answer using constant space overhead, constant time setup, and logarithmic time for queries. If you are willing to pay quadratic space (i.e., compute all intervals in advance) you can get answers in constant time. Your logarithmic scheme is almost certain to be preferred.

It wouldn't surprise me if it were possible to do better, but I'd be shocked if there were a simple data structure that could do better---and in practice, logarithmic time is almost always plenty fast enough. Go for it.

Norman Ramsey
I I'm not actually sure that it is log time (at least in the average case). I haven't sat down and puzzled it out in detail but it could well be better. +1 anyway :)
BCS
Extend the array to size N=2**m with infinities. Binary search on the array until you find a point in the span (that's k steps). Then you have to take m-k steps to the left to get the small end of the span, and m-k to the right for the large end. Cost is between m and 2m, where m = log N.
Norman Ramsey
I'll have to think on that. I was just considering the per row cost and that has a lower bound of 1 if the span happens to match something in the tree. The worst case would be the span of all but the first and last element and that would be about 2 log n. More interesting would be the general case.
BCS