views:

155

answers:

3

Hello,

I'm looking for a data structure that provides indexing for Rectangles. I need the insert algorithm to be as fast as possible since the rectangles will be moving around the screen (think of dragging a rectangle with your mouse to a new position).

I've looked into R-Trees, R+Trees, kD-Trees, Quad-Trees and B-Trees but from my understanding insert's are usually slow. I'd prefer to have inserts at sub-linear time complexity so maybe someone can prove me wrong about either of the listed data structures.

I should be able to query the data structure for what rectangles are at point(x, y) or what rectangles intersect rectangle(x, y, width, height).

EDIT: The reason I want insert so fast is because if you think of a rectangle being moved around the screen, they're going to have to be removed and then re-inserted.

Thanks!

+1  A: 

This is perhaps an extended comment rather than an answer.

I'm a bit puzzled about what you really want. I could guess that you want a data structure to support quick answers to questions such as 'Given the ID of a rectangle, return its current coordinates'. Is that right ?

Or do you want to answer 'what rectangle is at position (x,y)' ? In that case an array with dimensions matching the height and width of your display might suffice, with each element in the array being a (presumably short) list of the rectangles on that pixel.

But then you state that you need an insert algorithm to be as fast as possible to cope with rectangles moving constantly. If you had only, say, 10 rectangles on screen, you could simply have a 10-element array containing the coordinates of each of the rectangles. Updating their positions would not then require any inserts into the data structure.

How many rectangles ? How quickly are they created ? and destroyed ? How do you want to cope with overlaps ? Is a rectangle just a boundary, or does it include the interior ?

High Performance Mark
I've edited my question to clearly explain what I want.
TheCloudlessSky
A: 

I'd use a multiscale grid approach (equivalent to quad-trees in some form).

I'm assuming you're using integer coordinates (i.e. pixels) and have plenty of space to hold all the pixels.

Have an array of lists of rectangles, one for each pixel. Then, bin two-by-two and do it again. And again, and again, and again, until you have one pixel that covers everything.

Now, the key is that you insert your rectangles at the level that is a good match for the size of the rectangle. This will be something like (pixel size) ~= min(height,width)/2. Now for each rectangle you have only a handful of inserts to do into the lists (you could bound it above by a constant, e.g. pick something that has between 4 and 16 pixels).

If you want to seek for all rectangles at x,y you look in the list of the smallest pixel, and then in the list of the 2x2 binned pixel that contains it, and then in the 4x4 etc.; you should have log2(# of pixels) steps to look through. (For larger pixels, you then have to check whether (x,y) was really in the rectangle; you expect about half of them to be successful on borders, and all of them to be successful inside the rectangle, so you'd expect no worse than 2x more work than if you looked up the pixel directly.)

Now, what about insert? That's very inexpensive--O(1) to stick yourself on the front of a list.

What about delete? That's more expensive; you have to look through and heal each list for each pixel you're entered in. That's approximately O(n) in the number of rectangles overlapping at that position in space and of approximately the same size. If you have really large numbers of rectangles, then you should use some other data structure to hold them (hash set, RB tree, etc.).

(Note that if your smallest rectangle must be larger than a pixel, you don't need to actually form the multiscale structure all the way to the pixel level; just go down until the smallest rectangle won't get hopelessly lost inside your binned pixel.)

Rex Kerr
A: 

The data structures you mention are quite a mixed bag: in particular B-Trees should be fast (cost to insert grows with the logarithm of the number of items present) but won't speed up your intersection queries.

Ignoring that - and hoping for the best - the spatial data structures come in two parts. The first part tells you how to build a tree structure from the data. The second part tells you how to keep track of information at each node that describes the items stored below that node, and how to use it to speed up queries.

You can usually pinch the ideas about keeping track of information at each node without using the (expensive) ideas about exactly how the tree should be built. For instance, you could create a key for each rectangle by bit-interleaving the co-ordinates of its points and then use a perfectly ordinary tree structure (such as a B-tree or an AVL tree or a Red-Black tree) to store it, while still keeping information at each node. This might, in practice, speed up your queries enough - although you wouldn't be able to tell that until you implemented and tested it on real data. The purpose of the tree-building instructions in most schemes is to provide performance guarantees.

Two postscripts:

1) I like Patricia trees for this - they are reasonably easy to implement, and adding or deleting entries does not disturb the tree structure much, so you won't have too much work to do updating information stored at nodes.

2) Last time I looked at a window system, it didn't bother about any of this clever stuff at all - it just kept a linear list of items and searched all the way through it when it needed to: that was fast enough.

mcdowella