views:

2921

answers:

9

Range intersection is a simple, but non-trivial problem.

Its has been answered twice already:

The first solutions is O(n) and the second solution is for a database (which is less than O(n) of course).

I have the same problem, but for a large n and I am not within a database.

This problem seems to be very similar to http://stackoverflow.com/questions/303243/store-2d-points-for-quick-retrieval-of-those-inside-a-rectangle but I don't see how it maps.

So what data structure would you store the set of ranges in, such that a search on a range costs less than O(n)? (Extra credit for using libraries available for Java)

EDIT:

I want to get a subset of all intersecting ranges, meaning the search range could intersect multiple ranges.

The method that needs to be less than O(n) in Java is:

public class RangeSet {
    ....
    public Set<Range> intersects(Range range);
    ....
}

Where Range is just a class containing a pair of int start and end.

This is not an impossible question, I already have the solution, I just wanted to see if there was a more standard/simpler way of doing it

A: 

Just as a quad tree works for a set of 2d points, a simple binary tree should work for this case. Build a tree with your ranges.

To explain further: Each node in the tree contains two integers, the beginning and end of the range, and the two children if it's not a leaf node. To find the ranges that your input range spans, then starting at the top of the tree

  - if the node range intersects the input range:
     - if it's a leaf node, then add the range to your result list
     - if it's not a leaf node, then traverse down to the child nodes and repeat this process.

It should be O(logN)

Further detail: The binary tree would be structured like a 1-d version of a quad tree. Each node would have three integers (sorry I said two above, but now I realize you need three), the lowest representing the lowest value of the lowest range that's below this node, the highest representing the highest value of the highest range that's below this node, and the pivot. The left child would span from this node's lowest to its pivot. The right child would span from this node's pivot to this node's highest. If there is only one range that goes from "lowest" to "highest", you wouldn't have a pivot and this would be a leaf. Ideally you'd pick the pivots for each node to keep the tree balanced.

Paul Tomblin
Each range has 2 dimensions. I don't see how a binary tree would work.
Pyrolistical
Thanks for adding more detail, I don't understand how your tree would be structured.What is the parent/child relationship in your binary tree?
Pyrolistical
A: 

Sounds like you need a class that implements the SortedSet interface. TreeSet is the implementation that ships with the core API.

Have one set holding the ranges sorted by lowest value, and one sorted by highest value.

You can then implement the equivalent of the database algorithm using the in-memory sets.

As for whether this is actually faster than O(n), I couldn't say.

Bill Michell
I came to the same conclusion, but I want to see if there's a better way. This solution either works out to be O(log(n)) or O(log^2(n)). I am sure how much it cost to find the intersection between the two subsets.
Pyrolistical
+1  A: 

This depends on your exact problem, in the linked question, the ranges where distinct, no common part, and the searched ranged could span multiple ranges. If your problem is the same, its is really easy: Take an array of the ranges, sort them by their lowest values (since they do not overlap, this would be also the same order as sorted by their upper values).

Now just make a binsearch for the your target lower value (or smaller if not exact) and one for the target upper value(or bigger if not exact). The resulting indexes are the the ranges which are coverd. You have to check wheter the ranges at the indexes itself are in- or excluded, but that are just 2 checks. Overall complexity O(log n).

flolo
O(log(n)) only if the set is already sorted, or else its for sorting O(nlog(n))
Pyrolistical
You are completly right, but from the question it looks like the range set wouldnt change much, so this has to be done just once.
flolo
Yes, you just could have said that the set of ranges is a data type such that its sorted on the lower and upper values
Pyrolistical
A: 

When I had this problem, I used a sorted array of ranges and a binary search to look for intersections. This is (I believe) O(log n) performance, with a little bit of overhead to deal with overlapping ranges.

The answer to your question is, I think, derivable from the code below, but stopping short of the insertion. I present the entire code to avoid confusion by the differing context - I needed to insert a range of Unicode codepoints into a list of codepoint ranges.

-- EDIT --

Adapting the code below to determine intersections of multiple ranges involves a trivial forward search from the insertion point until a range is found which no longer intersects.

-- END EDIT --

The Range class contains:

final int                               lower;                                  // lower end of range
final int                               upper;                                  // upper end of range

public int compareTo(Object obj) {
    if(obj==null) { return -1; }

    Range                           oth=(Range)obj;

    if(lower<oth.lower) { return -1; }
    if(lower>oth.lower) { return  1; }
    if(upper<oth.upper) { return -1; }
    if(upper>oth.upper) { return  1; }
    return 0;
    }

Range Insertion:

public Builder addRange(int fir, int las) {
    if(fir!=-1) { fir&=0x001FFFFF; }
    if(las!=-1) { las&=0x001FFFFF; }

    if(codepoints==null || codepoints.length==0) {
        codepoints=new Range[]{new Range(fir,las)};
        }
    else {
        int                         idx=Range.findChar(codepoints,fir);
        int                         ins=(idx<0 ? -(idx+1) : idx);

        if(idx<0) {
            if     (ins>0                 && fir==(codepoints[ins-1].upper+1)) { idx=(ins-1); }  // new range adjoins the following range (can't overlap or idx would be >=0)
            else if(ins<codepoints.length && las>=(codepoints[ins  ].lower-1)) { idx=ins;     }  // new range overlaps or adjoins the following range
            }

        if(idx<0) {
            codepoints=(Range[])Util.arrayInsert(codepoints,ins,new Range(fir,las));
            }
        else {
            boolean                 rmv=false;

            for(int xa=(idx+1); xa<codepoints.length && codepoints[xa].lower<=las; xa++) {
                if(las<codepoints[xa].upper) { las=codepoints[xa].upper; }
                codepoints[xa]=null;
                rmv=true;
                }
            if(codepoints[idx].lower>fir || codepoints[idx].upper<las) {
                codepoints[idx]=new Range((codepoints[idx].lower < fir ? codepoints[idx].lower : fir),(codepoints[idx].upper>las ? codepoints[idx].upper : las));
                }
            if(rmv) { codepoints=Range.removeNulls(codepoints); }
            }
        }
    return this;
    }

Binary Search:

static int findChar(Range[] arr, int val) {
    if(arr.length==1) {
        if     (val< arr[0].lower) { return -1; }                             // value too low
        else if(val<=arr[0].upper) { return  0; }                             // value found
        else                       { return -2; }                             // value too high
        }
    else {
        int                             lowidx=0;                               // low index
        int                             hghidx=(arr.length-1);                  // high index
        int                             mididx;                                 // middle index
        Range                           midval;                                 // middle value

        while(lowidx<=hghidx) {
            mididx=((lowidx+hghidx)>>>1);
            midval=arr[mididx];
            if     (val< midval.lower) { hghidx=(mididx-1); }                   // value too low
            else if(val<=midval.upper) { return mididx;     }                   // value found
            else                       { lowidx=(mididx+1); }                   // value too high
            }
        return -(lowidx+1);                                                     // value not found.
        }
    }
Software Monkey
I think your problem only has 1 intersecting range, I want a subset of all intersecting ranges. I updated the question to reflect this.
Pyrolistical
Yes, because I am folding intersecting ranges together to create a single larger range; but with multiple ranges, a simple linear search from the hit backwards and forwards will locate the adjacent multiple ranges.
Software Monkey
+2  A: 

Non Overlapping Ranges:

Prep O(n log n):

  1. Make a array / vector of the ranges.
  2. Sort the vector by the end of the range (break ties by sorting by the start of the range)

Search:

  1. Use binary search to find the first range with an End value of >= TestRange.Start
  2. Iterator starting at the binary search until you find an Start > TestRange.End:

    2a. If the range if the current range is within the TestRange, add it to your result.

Adam Tegen
I think you got it, its so simple.
Pyrolistical
This is better than my solution.
Paul Tomblin
This won't work since the ranges could have very different lengths. One short could fall outside the query and stop the iterator, and the next long one (ordered by the end-coordinate) could still fall inside, and would thus be missed.
MizardX
Wait, missed the topic. For non overlapping ranges this would of course work.
MizardX
But the iteration phase is still O(n) as in the worst case your query intersects every range so you iterate over them all.
Andrew Beyer
+9  A: 

Edit: It sounds like this solution is more or less an Interval Tree. A more complete implementation of an Interval Tree can be found here.

class TreeNode
{
public:
    long pivot;
    List<Range> leaves;  //Any ranges that intersect the pivot
    TreeNode left;        //Tree nodes that fall to the left of the pivot
    TreeNode right;       //Tree nodes that fall to the right of the pivot
};

Prep O(n log n):

  1. Create the list of ranges
  2. Pick the pivot points (possibly by using a sorted list of the end dates.) ??
  3. Build your tree.

Search:

  1. Use binary search to find the first pivot that is >= TestRange.End
  2. Traverse the tree until the pivot > TestRange.Start

    2a. Add the leaves to your result.


Example:

Ranges:

  • 0 - 2
  • 1 - 2
  • 2 - 3
  • 1 - 4
  • 2 - 4
  • 0 - 5
  • 4 - 5
  • 2 - 6
  • 3 - 7

Tree:

                             4
               --------------+------------------
               3             |                 7
               |            1-4                |
               |            2-4                |
               |            0-5                |
               |            4-5                |
      ---------+------                 --------+--------
      2        |    null              6        |       null
 -----+----   2-3                 ----+----   3-7
null  |  null                   null  |  null    
     0-2                             2-6
     1-2
Adam Tegen
A: 

Overlapping Ranges:

Prep O(n log n):

  1. Make a array / vector of the ranges.
  2. Sort the vector by the end of the range (break ties by sorting by the start of the range)
  3. Make a second vector of ints. This represents the point at which you can stop searching.

    int stop[size];
    stop[size-1] = Ranges[size - 1].start;
    for (int i = size - 2; i >= 0; i--)
    {
        stop[i] = min(Ranges[i].start, stop[i+1]);
    }
    

Search:

  1. Use binary search to find the first range with an End value of >= TestRange.Start
  2. Iterator starting at the binary search until stop[i] > TestRange.End:

    2a. If the range if the current range is within the TestRange, add it to your result.

Adam Tegen
+6  A: 

The standard approach is to use an interval tree.

Rafał Dowgird
This is the answer I was looking for
Pyrolistical
A: 

If ranges overlap, and one wants to retrieve all the ranges that overlap (or contain) a given target range, most of the solutions above don't appear to work.

As some have pointed out, if (worst-case) all the ranges happen to intersect the target range (for example, if the target range is {0..MAXINT} or similar) then of course it takes O(n) to return the n ranges.

But isn't the interesting and typical/average case, where only a very small % of the n total ranges do intersect the target range? Call the number that do intersect "m" -- in that case, you might conceivably be able to do as well as O(m). And if n=10^9 and m=10, that's a make-or-break difference.

Consider the simple case of a text document that has various regions marked up for their "type" -- perhaps you want to find all the marked-up units that contain or intersect a given contiguous range of text (for example, a paragraph). In HTML, XML, or similar those can only be ancestors of the text-node(s) containing at least some characters of the target range. In typical representations with parent pointers in each node, that's O(m) -- way better than O(n), especially because m is (for short or synchronous target ranges) merely the tree nesting depth, which tends to be even lower than ln(n) because big XML documents in practice get bushier not deeper.

The interesting case is harder: what if your "elements" do not form a tree as in XML, but can overlap as in MECS, CLIX, LMNL, and some other systems? You still want to find all the regions/"elements" that overlap your target, but they aren't so easily organized.

On the other hand, you should be able to do very well because marked-up ranges in many applications are most frequently small -- there are far more words, sentences, and paragraphs in a book, than there are chapters. So even though there may be a huge number of ranges that start before the target and a huge number that end after it, the intersection will be very small on average.

I think that's what the original questioner was getting at, and I'm afraid i didn't see an answer that addresses that problem. If it's not what the original question was about, then I'd like to pose it as a new question.

TextGeek