views:

65

answers:

3

In a multi-dimensional space, I have a collection of rectangles, all of which are aligned to the grid. (I am using the word "rectangles" loosely - in a three dimensional space, they would be rectangular prisms.)

I want to query this collection for all rectangles that overlap an input rectangle.

What is the best data structure for holding the collection of rectangles? I will be adding rectangles to and removing rectangles from the collection from time to time, but these operations will be infrequent. The operation I want to be fast is the query.

One solution is to keep the corners of the rectangles in a list, and do a linear scan over the list, finding which rectangles overlap the query rectangle and skipping over the ones that don't.

However, I want the query operation to be faster than linear.

I've looked at the R-tree data structure, but it holds a collection of points, not a collection of rectangles, and I don't see any obvious way to generalize it.

The coordinates of my rectangles are discrete, in case you find that helpful.

I am interested in the general solution, but I will also tell you the properties of my specific problem: my problem space has three dimensions, and their multiplicity varies wildly. The first dimension has two possible values, the second dimension has 87 values, and the third dimension has 1.8 million values.

+1  A: 

You can probably use KD-Trees which can be used for rectangles according to the wiki page:

Variations

Instead of points

Instead of points, a kd-tree can also contain rectangles or hyperrectangles[5]. A 2D rectangle is considered a 4D object (xlow, xhigh, ylow, yhigh). Thus range search becomes the problem of returning all rectangles intersecting the search rectangle. The tree is constructed the usual way with all the rectangles at the leaves. In an orthogonal range search, the opposite coordinate is used when comparing against the median. For example, if the current level is split along xhigh, we check the xlow coordinate of the search rectangle. If the median is less than the xlow coordinate of the search rectangle, then no rectangle in the left branch can ever intersect with the search rectangle and so can be pruned. Otherwise both branches should be traversed. See also interval tree, which is a 1-dimensional special case.

Moron
A: 

You have to use some sort of a partitioning technique. However, because your problem is constrained (you use only rectangles), the data-structure can be a little simplified. I haven't thought this through in detail, but something like this should work ;)

Using the discrete value constraint - you can create a secondary table-like data-structure where you store the discrete values of second dimension (the 87 possible values). Assume that these values represent planes perpendicular to this dimension. For each of these planes you can store, in this secondary table, the rectangles that intersect these planes.

Similarly for the third dimension you can use another table with as many equally spaced values as you need (1.8 million is too much, so you would probably want to make this at least a couple of magnitudes smaller), and create a map the rectangles that are between two chosen values.

Given a query rectangle you can query the first table in constant time to determine a set of tables which possibly intersects this query. Then you can do another query on the second table, and do an intersection of the results from the first and the second query results. This should narrow down the number of actual intersection tests that you have to perform.

tathagata
A: 

Let's call the original problem by PN - where N is number of dimensions.

Suppose we know the solution for P1 - 1-dimensional problem: find if a new interval is overlapping with a given collection of intervals.

Once we know to solve it, we can check if the new rectangle is overlapping with the collection of rectangles in each of the x/y/z projections.

So the solution of P3 is equivalent to P1_x AND P1_y AND P1_z.

In order to solve P1 efficiently we can use sorted list. Each node of the list will include coordinate and number-of-opened-intetrvals-up-to-this-coordinate.

Suppose we have the following intervals: [1,5] [2,9] [3,7] [0,2]

then the list will look as follows:

{0,1} , {1,2} , {2,2}, {3,3}, {5,2}, {7,1}, {9,0}

if we receive a new interval, say [6,7], we find the largest item in the list that is smaller than 6: {5,2} and smllest item that is greater than 7: {9,0}.

So it is easy to say that the new interval does overlap with the existing ones.

And the search in the sorted list is faster than linear :)