views:

462

answers:

10
+17  Q: 

Rectangles Covering

I have N rectangles with sides parallel to the x- and y-axes. There is another rectangle, model. I need to create an algorithm that can tell whether the model is completely covered by the N rectangles.

I have some ideas. I think that first, I need to sort the rectangles by their left side (it can be done in O(n log n) time), and then use a vertical sweeping line.

alt text

The blue rectangle is the model.

First of all, I need the abstract algorithm. There are no special requirements with regard to the realization. A rectangle can be represented as a pair of points (left-top and bottom-right).

This is one of the tasks for preparing for a test. I know that the best algorithm can do this in O(n log n) time.

A: 

Hard to know what you are looking for but it sounds to me like an R-tree might work?

jk
It seems useful, but i have no full picture of how it can be done.
den bardadym
A: 

OK, I've asked enough questions, here's something of an answer ...

If the data is represented as a raster one algorithm is trivial:

  1. Create a boolean array representing the model (ie Blue) rectangle. Set all elements to FALSE (representing 'not covered')
  2. For each Red rectangle (ignore the ones which can't overlap the Blue) set all elements of the array to TRUE if they are inside the Red rectangle.
  3. Finally, check whether or not every element of the array has been set TRUE.

If the data is vector it's a little more complicated. First define a function which returns the rectangle representing the intersection (if any) of two rectangles. This is simple. Then proceed:

  1. Create a variable called 'UnCoveredRectangle' which is initialised to be the same as the Blue rectangle.
  2. Again, only bother with the Red rectangles which intersect the Blue one. For each Red rectangle, compute the intersection of the rectangle with the UnCoveredRectangle. The intersection will result in one of the following situations:

    2.1 The intersection equals the UnCoveredRectangle. The job is finished.

    2.2 The intersection 'bites' a rectangular chunk out of the CoveredRectangle. The remaining UnCoveredRectangle will be either a rectangle, an L-shaped piece, or a U-shaped piece. If it is a rectangle itself, set UnCoveredRectangle to be the remaining rectangle, and go to step 2. If the UnCoveredRectangle is L- or U-shaped, split it into 2, or 3, rectangles and recurse from step 2..

  3. If you run out of Red rectangles before the area of (part of) UnCoveredRectangle is sent to 0, you've finished.

OK I haven't got a clue about the complexity of this algorithm, but unless the number of rectangles is huge, I'm not too concerned, though perhaps @den is. And I've omitted a lot of details. And I can't draw nice diagrams like @den did, so you'll have to picture it for yourselves.

High Performance Mark
it is decision, but complexity...
den bardadym
If you don't sort the rectangles by position, you have different sorts of results: it can be an `O` (cut to 4), or even `||`, split to two. Sorting also helps, as you can know early the rectangles don't fully cover the area.
Kobi
As for complexity - basically, you take the model as a polygon (with more than one path - it may be split), and subtract each rectangle. The polygon grows more and more complex, so I believe we're looking at O(N²). The opposite is about that same: you can unite all rectangles, and then add the model and see if it makes a difference.
Kobi
It is more complex than O(N^2)
den bardadym
+1  A: 

Here's a generic algorithm

  1. is there any rectangle covering the whole model?
    if yes you are done
  2. if no are there any rectangles partially covering the model?
    if yes
  3. is the union of all the intersections of rectangle and model equal to the model
    if 2. or 3. is no then the answer is no, otherwise it is yes

Now the question is how to do the above efficiently. The above can be done in a single loop over all polygons, so I think you are looking at O(n) time.

If you need to create efficient algorithm that will test multiple models, or if you must optimize for fastest answer possible (at the expense of preparing the data) then you are looking for a structure that will allow quick answer to question if a rectangle intersects (or contains) a rectangle.

If you use, for example kd-trees, I believe that this can be answered in O(log n) time - but the important variable in this algorithm is also the number of found rectangles k. You will end up with something like O(k + log n) and you will also need to do the union part to test if the model is covered.

My guess is that the union could be computed in O(k log k), so if k is small then you are looking at O(log n) and if k is comparable to n then it should be O(k log k).

See also this question.

EDIT: In response to complexity of intersections and unions.

In more details, we have two scenarios depending on if k << n or k comparable to n

a) in case of k << n and assuming polynomial complexity for intersection/union (so here I give up the guess O(k log k) ) we have:

  • log n to find a range in kd indexed tree (of rectangles),
  • k steps to retrieve all the rectangles in that 'branch',
  • f(k) some polynomial time to calculate intersections and union

The total is O(k + log n + f(k)), which is directly equal to O(log n) since big O depends only on the fastest growing term.

In this case the more significant part of the algorithm is finding the polygons.

b) in the case of k comparable to n we must take a look at the complexity of intersection and union algorithms (notice the parallel here on how the RDBMs, depending on selectivity, might use index or do table scan; it is a similar choice to what we have here).
If k is big enough the algorithm becomes less of an algorithm for finding rectangles that intersect with the rectangle and becomes more of an algorithm for calculating the union of polygons.

But, i believe that the kd tree can also help in finding the intersection of segments (which are necessary to build the union), so even this part of algorithm might not need k^2 time. At this point I would investigate efficient algorithms for calculating the area of unions.

Unreason
Are you sure you can merge N intersections in O(N) time? The problem in your third point the essentially same as the original question, 1 and 2 are just trivial cases you can solve in O(N). The complexity of the polygon will increase as you unite more rectangles, leading to a O(N²) operation. I should say, though, I have limited experience, and may be thinking of a naive approach.
Kobi
No, i said that efficient algoritm can do this by O(n log n).
den bardadym
@Kobi: Union of the intersections will happen in O(f(k)) time, for k << n, where k is the number of polygons that intersect with the model, this is not the fastest growing term and gets dropped. For the analysis of the merge/union algorithm check the EDIT in the answer.
Unreason
I'm sorry, but I don't think adding k helps here at all. You're trying to reduce the problem from n to k, but end up with the same problem! In fact, you might as well assume all rectangles intersect with the model; filtering the other rectangles is trivial, and could be done at the first step for "free" (that is, O(N) on a O(NlogN) algorithm). Of course the solution is simple if the relevant input is much smaller, but complexity is about worst case...
Kobi
@Kobi, after thinking a bit I think you are right, it seems I went on a tangent there; but still I didn't say anything wrong, just useless :) (well actually I analyzed only a special case, and in that case you get O(log n) complexity, if you don't count O(n log n) for building a kd-tree). +1 for your comment
Unreason
+1  A: 

You're on the right track with the sweep line. Conceptually, we want to detect when intersection of the model with the sweep line is not covered by the other rectangles. The high-level template is to break each rectangle into a "left edge" and a "right edge" event, sort the events by x coordinate (putting lefts before rights if the rectangles are closed and rights before lefts if they are open), and then process each event in O(log n) time. This is basically homework, so I will say no more.

It is the best adnice what i see on my question, thank. i am thinking in this way as well as you.
den bardadym
Could you describe how to handle the event? How do you handle it in O(log n)? That's the most important thing, it's kind of obvious you have to sort... @den: I suggest you wait some more before you accept any answer, no matter how helpful. Give others a chance to post something as well.
IVlad
@IVlad: as with most sweep line algorithms, it involves a balanced binary tree.
@algorithmist: that's still not really helpful. You're only reproducing theoretical texts, the question is how to use it on this particular problem.
IVlad
that is what sweep line means
Timmy
Homework. Seriously. Stop asking.
its not homework for me, and i cannot figure it out
Timmy
@Timmy: you're not the only one. @algorithmist: -1 from me for contributing precisely nothing but textbook definitions.
IVlad
`@algorithmist`: Having seen spot on answers of yours previously I am fairly confident that you do have a solution. Since this question is not tagged homework and we are quite a good number of people being clueless... would you mind being more precise ?
Matthieu M.
I've found out how to deal with the "event" in `O(log n)`. However I've realized that any `sweep line` algorithm is bound to be at least `O(n**2)` in the worst case because I don't know of any sorting algorithm that does not degenerate for some particular input.
Matthieu M.
+1  A: 

There is a trivial O(N^2) approach that is similar to the "raster" approach that is brought up. Since all the rectangles are axis-parallel, there can only be at most 2N distinct x and y dimension. Sort all the x's and y's and remap: x_i -> i. So now you have a 2N x 2N matrix where you can easily use the naive O(N^2) algorithm.

Larry
+1 for mentioning remapping of the coordinates, -1 because this is still quadratic.
IVlad
True, I thought it would help. No one has a "real" `O(N log N)` solution posted, and the op did not appear to require that solution. I had also written some things about the obvious events and its handling, but couldn't get down to O(log N) check - O(k) output sensitive at best. In either case, this is an entirely feasible low constant solution that is practical and easy to implement.I don't think it's necessary that we only post the "optimal" solution, otherwise Chazelle's `O(N)` simple polygon triangulation will not leave room for the `O(N log N)` algorithms, despite being infeasible.
Larry
Besides, I do not hide this fact. =) And I believe it's the simplest `O(N^2)` solution - let me know otherwise.
Larry
I also agree that it's the simplest. However, I'm still hoping someone will figure out the `O(n log n)` solution eventually. I'm very interested in it, since nothing I tried seems to get me anywhere.
IVlad
Me too, but until then! =) I've played with a few data structures using the obvious sweep line algorithm and nothing came up.
Larry
+4  A: 

Here's a divide and conquer algorithm. AVERAGE case complexity is very good and I'd say it's something like O(n log MaxCoords). Worst case could be quadratic though, however I think such a test would be pretty difficult to create. To make it even harder, make the order of the recursive function calls random. Maybe what @Larry suggested can get this to O(n log n) on average.

I can't figure out a sweep line solution, but for the tests I've tried this is very fast.

Basically, use a recursive function the works on the blue rectangle. First check if the blue rectangle is completely covered by one of the other rectangles. If yes, we're done. If not, split it into 4 quadrants, and recursively call the function on those quadrants. All 4 recursive calls must return true. I'm including some C# code that draws the rectangles. You can change it to work with larger values as well, but do remove the drawing procedures in that case, as those will take forever. I've tests it with a million rectangles and a square of side one billion generated such that it isn't covered, and the provided answer (false) took about a second on a quadcore.

I've tested this on random data mostly, but it seems correct. If it turns out it isn't I'll just delete this, but maybe it'll get you on the right path.

public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
    }

    List<Rectangle> Rects = new List<Rectangle>();

    private const int maxRects = 20;

    private void InitRects()
    {
        Random rand = new Random();

        for (int i = 0; i < maxRects; ++i) // Rects[0] is the model
        {
            int x = rand.Next(panel1.Width);
            int y = rand.Next(panel1.Height);

            Rects.Add(new Rectangle(new Point(x, y), new Size(rand.Next(panel1.Width - x), rand.Next(panel1.Height - y))));
        }
    }

    private void DrawRects(Graphics g)
    {
        g.DrawRectangle(Pens.Blue, Rects[0]);
        for (int i = 1; i < Rects.Count; ++i)
        {
            g.DrawRectangle(Pens.Red, Rects[i]);
        }
    }

    private bool Solve(Rectangle R)
    {
        // if there is a rectangle containing R
        for (int i = 1; i < Rects.Count; ++i)
        {
            if (Rects[i].Contains(R))
            {
                return true;
            }
        }

        if (R.Width <= 3 && R.Height <= 3)
        {
            return false;
        }

        Rectangle UpperLeft = new Rectangle(new Point(R.X, R.Y), new Size(R.Width / 2, R.Height / 2));
        Rectangle UpperRight = new Rectangle(new Point(R.X + R.Width / 2 + 1, R.Y), new Size(R.Width / 2, R.Height / 2));
        Rectangle LowerLeft = new Rectangle(new Point(R.X, R.Y + R.Height / 2 + 1), new Size(R.Width / 2, R.Height / 2));
        Rectangle LowerRight = new Rectangle(new Point(R.X + R.Width / 2 + 1, R.Y + + R.Height / 2 + 1), new Size(R.Width / 2, R.Height / 2));

        return Solve(UpperLeft) && Solve(UpperRight) && Solve(LowerLeft) && Solve(LowerRight);
    }

    private void Go_Click(object sender, EventArgs e)
    {
        Graphics g = panel1.CreateGraphics();
        panel1.Hide();
        panel1.Show();
        Rects.Clear();

        InitRects();
        DrawRects(g);

        textBox1.Text = Solve(Rects[0]).ToString(); 
    }
IVlad
Really very great thanks. It is first time when i got such answer. I like your approach to evaluation other answers. It is very right, imho.
den bardadym
Sweeping line needed complex implementation using ie interval trees or so on.
den bardadym
If you ever find a sweep line solution, please post it if you don't mind :). I'm very much interested in how it works.
IVlad
Worst case would be no intersections, so it runs through the full recursion. (...Or exactly one "1 coordinate" rectangle covering every single spot on the model rectangle, or some combination thereof.) In the worst case, it will be O(r * k log k), with r = number of rectangles, and k being the number of coordinates on the model rectangle. Doesn't really work with floating point coordinates, because you have to declare an arbitrary accuracy (the size of the "coordinates", in your case 3x3 blocks), but very nice nonetheless. :)
Tanzelax
I know all about interval trees, Fenwick trees, and a few other variations in comp geo, but maybe I'm too silly enough to match. I also have some expected `O( N log N )` algorithms that are worst case `O(N^2)`. I'm really curious about the answer!
Larry
So am I, I've been trying to elaborate on a `sweep line` + `intervals` solution (see below) but I can't wrap my head around it... `@Larry` could you expose the complexity of `insertion`, `removal` and `query` on interval trees ? It seems to me they are `O(log n)` in average but can degenerate to `O(n)` in the worst case. I do wonder if we are not talking average complexity, it would not be the first time that complexity is used vaguely.
Matthieu M.
The sweep line approach failing, I've reverted to a classic `Divide And Conquer` approach but my division of the search space is based on the input rather than being fixed, which should cull a number of pathological inputs. I would appreciate your criticisms.
Matthieu M.
A: 

Here's a way to do this without using rasterization, that is, using only pure numbers.

Note: This is not O(n log n), more like O(n^2). It is, however, a solution. Whether it is an answer, probably not if O(n log n) is a requirement.

  1. Create a list of all the unique X-coordinates of the left and right edges (in the same list)
  2. When you build this list, for each coordinate, also associate a list of rectangles with it, and denote in this list whether the X-coordinate the list is associated with is the left or the right edge of the rectangle
  3. Sort the list of coordinates so that it is ascending, going from left to right
  4. Create a new list of rectangles, initially empty
  5. Run through the list of coordinates, and process the associated list of rectangles for it
  6. All rectangles in the associated list that are denoted as having the coordinate as the left edge should be added to the list you created in 4.
  7. All rectangles with the coordinate as the right edge should be removed.
  8. The order of adding and removing should actually be done in the opposite order, first you want to remove the right-edge-rectangles, and then you add all the left-edge-rectangles
  9. Now, before you remove the rectangles from the list you created in 4, you want to process them. You process them by treating them as "sub-rectangles". Their "new" left edge coordinate is the X-coordinate you processed the previous iteration of the loop (in 5), and the new right edge is the current X-coordinate being processed
  10. The output of the algorithm is a collection of coordinate-pairs (the two X-coordinates left and right), each pair having an associated list of rectangles (the vertical strip)

The output should thus be:

  1. X=1..2, Rects=A
  2. X=2..3, Rects=A, B
  3. X=3..4, Rects=A, B, C
  4. X=4..5, Rects=A, C
  5. X=5..6, Rects=C

Let me illustrate the process so far

+-------------------+
|A                  |
|        +----------+-----+
|        |C         |     |
|  +-----+----+     |     |
|  |B    |    |     |     |
|  |     +----+-----+-----+
|  |          |     |
+--+----------+-----+
   |          |
   +----------+

^  ^     ^    ^     ^     ^
1  2     3    4     5     6  <-- X-coordinates

Associated rectangles:

  1. left: A, right: (none)
  2. left: B, right: (none)
  3. left: C, right: (none)
  4. left: (none), right: B
  5. left: (none), right: A
  6. left: (none), right: C

You now create an empty list, L=[], and start processing the coordinates and associated rectangles:

X=1

List is empty, nothing to process Nothing to remove Add A: L=[ A ]

X=2

List contains rectangles, process list as rectangles that have a left edge of X=1, and a right edge of X=2 (the two coordinates we've processed so far), and use their original top and bottom coordinates. Nothing to remove. Add B: L=[ A, B ]

X=3

List contains rectangles, process list (both A and B) the same way, treat them as temporarily having left and right coordinates as X=2 and X=3, and use their original top and bottom coordinates. Nothing to remove Add C: L=[ A, B, C ]

X=4

Process the three rectangles the same way as above, temporary left and right coordinates are X=3 and X=4 Remove B: L=[A, C ] Nothing to add

X=5 and X=6

Process these in the exact same manner.

This means you will end up with "strips" of rectangles, like this (I've pulled them a bit apart to clearer illustrate that they are strips, but they are located side-by-side continously like in the original diagram):

+--+ +-----+ +----+ ------+
|A | |     | |    | |     |
|  | |     | +----+ +-----+ +-----+
|  | |     | |C   | |     | |     |
|  | +-----| +----+ |     | |     |
|  | |B    | |    | |     | |     |
|  | |     | +----+ +-----| +-----+
|  | |     | |    | |     |
+--+ +-----+ +----+ +-----+
     |     | |    |
     +-----+ +----+
^  ^ ^     ^ ^    ^ ^     ^ ^     ^
1  2 2     3 3    4 4     5 5     6

Ok, so now you have your output, a collection of coordinate-pairs, each pair having an associated list of rectangles.

Now we do a trick. We process the vertical strip in the exact same manner, only this time we use the Y coordinates as the delimiters. Let's handle the strip between 3 and 4 above. Remember that the strip has a left and right coordinates of 3 and 4.

Strip looks like this:

^     +----+ <-- 1
A     |    |
|   ^ +----+ <-- 2
|   C |    |
| ^ | +----+ <-- 3
| B | |    |
| | V +----+ <-- 4
| |   |    |
V |   +----+ <-- 5
  |   |    |
  V   +----+ <-- 6

List of coordinates with rectangles:

  1. Top=A, Bottom=(none)
  2. Top=C, Bottom=(none)
  3. Top=B, Bottom=(none)
  4. Top=(none), Bottom=C
  5. Top=(none), Bottom=A
  6. Top=(none), Bottom=B

New empty list L=[]

Process the coordinates:

Y=1

Nothing to process (L=[]) Add A to list, L=[ A ]

Y=2

Process A with temporarily having a top and bottom coordinates of Y=1 and 2 (and remember that it also has a temporary left and right coordinates of X=3 and 4 Add C, L=[ A, C ]

Y=3

Process A and C, both now having temporary coordinates of (3, 2)-(4, 3) Add B, L=[ A, B, C ]

Y=4

Process A, B and C, all having coordinates of (3, 3)-(4, 4) Remove C, L=[ A, B ]

Y=5

Process A and B, coordinates (3, 4)-(4, 5) Remove A, L=[ B ]

Y=6

Process B, coordinates (3, 5)-(4, 6)

Final output:

pairs of Y-coordinates, with rectangles associated with them (that also have temporarily got new X-coordinates):

  • (3, 1)-(4, 2) - A
  • (3, 2)-(4, 3) - A, C
  • (3, 3)-(4, 4) - A, B, C
  • (3, 4)-(4, 5) - A, B
  • (3, 5)-(4, 6) - B

Now, suppose you want to ask the question: Is B fully covered by all any combination of the other rectangles.

The answer can be worked out as follows:

  1. Loop through all the rectangles in the final list above
  2. If B is NOT part of the associated rectangle, ignore this rectangle
  3. If there is any other of the original rectangles associated with the coordinates, then this part of B is covered by that/those rectangle(s)
  4. If there is no other of the original rectangles associated with the coordinates, then this part of B is not covered.

In the above example, we see that the 3rd and 4rd rectangle in the final list contains B, but also contains other rectangles, hence those parts of B is covered, but the final rectangle in the list also contains B, but no other rectangle, hence this part is not covered.

According to the original diagram, the final result would include rectangles as follows (the letters inside each denote which original rectangle is associated with this new rectangle):

+--+-----+----+-----+
|A |A    |A   |A    |
|  |     +----+-----+-----+
|  |     |AC  |AC   |C    |
|  +-----+----+     |     |
|  |AB   |ABC |     |     |
|  |     +----+-----+-----+
|  |     |AB  |A    |
+--+-----+----+-----+
   |B    |B   |
   +-----+----+

If we now take a new look at the original diagram, I have shaded out the parts that the above algorithm would find contains B, but no other rectangle:

+-------------------+
|A                  |
|        +----------+-----+
|        |C         |     |
|  +-----+----+     |     |
|  |B    |    |     |     |
|  |     +----+-----+-----+
|  |          |     |
+--+-----+----+-----+
   |#####|####|
   +-----+----+

The vertical bar in the middle there is to illustrate that the part would be returned as two rectangles, split at that location, due to the way the vertical strips were worked out.

I seriously hope I made myself understood here. I have some code that can help you with the implementation of each run through the lists of coordinates, but it's 01:21 past midnight here and I'm going to bed, but leave a comment if you wish to see some actual code for this.

Or it would be a great exercise to implement it yourself :)

Here's the link to the class containing the method in question: RangeExtensions.cs.

The method is the Slice method, just search the page for it. The type in question, Range, is basically a range from one value to another, so there's a bit of data construction and maintenance in addition to the above algorithm.

Lasse V. Karlsen
Wouldn't the final "list" be of size N^2 (O(N) vertical strips, each divided O(N) times), thus making the final step of iterating to find orphaned B's O(N^2)?
Tanzelax
first you process N things sidewise, then N things downwards, then the final result, so it is before truncation O(N^2+a*N), so yes, it is O(N^2). I didn't say it was O(n log n), sorry for not mentioning that it wasn't. It is *a* solution, a fairly fast one, I had a form full of overlapping rectangles where the color of each small part was related to the number of rectangles covering that part, built when I was originally optimizing my implementation for this partioning problem. I'll post a link to the code that deals with this.
Lasse V. Karlsen
Looks like http://stackoverflow.com/questions/244452/what-is-an-efficient-algorithm-to-find-area-of-overlapping-rectangles/298269#298269
Will
My answer above is the same as my answer in for that question: http://stackoverflow.com/questions/244452/what-is-an-efficient-algorithm-to-find-area-of-overlapping-rectangles/245078#245078, uses the same code. The difference between my implementation and that of the link, is that my method takes an ordered source of ranges, and produces the slices, and the source of ranges does not have to be an in-memory list, it can be a streaming source (like from a database). Hence why I don't use the "paint the cells" implementation of your link, which was quite elegant btw.
Lasse V. Karlsen
A: 

Here's an O(n lg n) runtime approach using some memory.

Using the example:

We're only interested in the subpart of the scene that contains the 'model' rectangle; in this example, the 'model' rectangle is 1,1 -> 6,6

   1   2   3   4   5   6

1  +---+---+
   |       |   
2  +   A   +---+---+
   |       | B     |
3  +       +   +---+---+
   |       |   |   |   |
4  +---+---+---+---+   +
               |       |
5              +   C   +
               |       |
6              +---+---+

1) collect all the x coordinates that are within the bounds of the model rectangle (both left and right) into a list, then sort it and remove duplicates

1 3 4 5 6

2) collect all the y coordinates that are within the bounds of the model rectangle (both top and bottom) into a list, then sort it and remove duplicates

1 2 3 4 6

3) create a 2D array by number of gaps between the unique x coordinates * number of gaps between the unique y coordinates. This can use a single bit per cell, and you can consider using say C++ STL's bit_vector() for an efficient representation.

4 * 4

4) paint all the rectangles into this grid, painting cell it occurs over:

   1   3   4   5   6

1  +---+
   | 1 | 0   0   0
2  +---+---+---+
   | 1 | 1 | 1 | 0
3  +---+---+---+---+
   | 1 | 1 | 2 | 1 |
4  +---+---+---+---+
     0   0 | 1 | 1 |
6          +---+---+

5) Should any cells remain unpainted, you know your model is not completely occluded (or whatever you are testing).

The gathering coordinates and the painting steps are O(n), and the sorting of the coordinates is O(n lg n).

This is adapted from one of my answers to: http://stackoverflow.com/questions/244452/what-is-an-efficient-algorithm-to-find-area-of-overlapping-rectangles/298269#298269

Will
I don't understand how this is `O(n log n)`: the grid has about `O(n^2)` elements, and you populate them one by one. It's basically a raster approach.
Matthieu M.
this is also what @Larry wrote
Timmy
+1  A: 

I've been thinking about it and I think I finally understood what @algorithmist meant by sweep line. However the very presence of sorting operations means that I have:

  • O(n log n) in average
  • O(n**2) in the worst case

Sweep Line

First of all, we need some sorting, because our sweep line will consist of walking an ordered set.

This ordered set will feature the top and bottom line of each of the reds, as long as they are between the top and bottom of blue. This divides our space into (at most) n*2 horizontal strips.

Now, we need to make sure that in each strip, the whole of blue is covered, and this operation cannot have more than O(log n) complexity, this could be done if we had (for each strip) a list of the covered intervals. This is the 'event' @algorithmist is speaking of

To handle this event, we'll use a binary tree described below which handles adding or removing a rectangle in exactly O(log n) operations and yields the rightmost interval covered by the tree, which we use to tell if the strip of blue is covered or not.

Binary Tree

First of all, I am going to index the red rectangles. We sort them using this function:

def __lt__(lhs, rhs):
  return (lhs.left <  rhs.left)
      or (lhs.left == rhs.left and lhs.right < rhs.right)

I am going then to create a dedicated binary tree.

  • It will have N leaves, each representing a red rectangle and indicating its presence or absence. They are ordered according to the index.
  • Each intermediary node will have for value the rightmost interval covered by its children

Handling the bug "code block cannot follow list":

class Node:
  def __init__(self):
    self.interval = []
    self.left  = None
    self.right = None

Now we have two possibilities, let's say the children cover [a,b] and [c,d]:

  • if c <= b, then the node hold [a,d]
  • else it holds [c,d]

Why does it works ? Let's take an example using 4 leaves:

        _ [1,9] _
       /         \
   [1,7]         [6,9] <-- Special node merge
   /   \         /   \
  /     \       /     \
[1,3] [2,7]   [3,5] [6,9]

The special node ignore [3,5] because it's not the rightmost interval. The reasoning is that the rectangles are ordered:

  • No rectangle on the right of [6,9] will cover the missing [5,6] interval since they begin after 6
  • The rectangles on the left of [3,5] begin before 3, so if they cover the missing [5,6] they'll cover [3,5] anyway

The root may not indicate the exact set of intervals covered: only the rightmost interval covered. However, it's perfectly sufficient for us to tell if blue is completely covered or not!

There are 2 operations available on this tree:

  • Marking a rectangle as present
  • Marking a rectangle as absent

Each is similar:

  • if the rectangle was already in this state, do nothing
  • else, toggle its state, then update its parent interval (recursively, up to the root)

The recursive bit takes O(log n). It's a classic property of the balanced binary trees. And once it's done we have the rightmost interval covered by the root which is sufficient to tell whether or not the blue segment is entirely covered or not.

Complexity

The complexity of the algorithm is simple:

  • We have O(n) events
  • Each event is handled in O(log n)

Which yields O(n log n) for the core part.

However, we should not forget that we also have 2 sort operations:

  • one to classify the events (for the sweep line)
  • the other to classify the rectangles (for the binary tree)

Each shall take O(n log n) in average, but may degenerate into O(n**2) in the worst case, depending on the sorting algorithm used.

Matthieu M.
I think the problem is not the insertion/deletion, but checking each step of the way that the entire blue interval is covered and still covered at `O(log N)` time. At least, that was what I had trouble doing, unless I'm missing something obvious.
Larry
I understand, the article said "query time is O(log n)" so I had assumed it would be the easy part, but I think it only refers to querying whether a point (and not an interval) is covered.
Matthieu M.
Okay, I created a new structure for this problem which guarantees `O(log n)` operations to add or remove an interval. It's based on the fact that we know the intervals at hand beforehand.
Matthieu M.
About sorting: if we use heap sort we can do this by O(n log n). Exist teorema that the best is O(n log n).
den bardadym
Thanks. For your answer.
den bardadym
+1  A: 

Okay, now it seems I can't even sleep at night because I think about this problem... but it also seems I finally got an O(n log n) solution, in average case (but with reduced chances of having a pathological input compared to @lVlad).

We all know the Divide and Conquer technic. To ensure O(n log n) using it, we usually focus on 2 points:

  • the divide and merge should be O(n)
  • there exist C > 1 such that at each step the size of the subproblems is reduced by a factor C (constant throughout the problem)

With these constraints in mind I have elaborated the following algorithm, which is reminiscent of qsort, and thus suffer the same pitfalls (namely, fractal inputs).

Algorithm

  1. Clipping: we only consider the portion of a red that intersect with blue, they are inserted in a HashSet to remove duplicates --> O(n)
  2. Pivot Selection: more on this later --> O(n)
  3. Partition: once we have a pivot, we subdivise the space in 3d zones, one of which being the pivot, with d being the dimension (d = 1 for segments, 2 for rectangles, 3 for cubes etc...)
  4. Repartition: we put the red in the partitions, applying the Clipping technic, note that a given red might end up giving several chunks in different partitions
  5. Recursion: we apply recursively (from step 2) on each partition, beginning by the least populated one and stopping as soon as one is not covered

The Pivot Choice is the corner stone of the algorithm, if the partition is ill-tailored we cannot achieve the required complexity. Also, it must be accomplished in O(n). I have 2 proposals so far:

  • Maximum Area: use the rectangle with the greatest area, because it means that the partitions will have the smallest area afterward, however it suffers from being easy to trump
  • Median of 3: based on qsort, pick up 3 elements, selection the median (the one with the center closer to the barycenter of the 3 centers)

I propose to mix them up thusly:

  • Pick up the 3 elements with the greatest area, select the median, use it at pivot
  • If after the repartition it turns out that one of the partition is populated with more than N/C (to be customized) elements, pick up 3 elements at random, select the median, and do the repartition (no check here)

Another aspect of implementation is the Tail of the recursion. Like qsort it's probable that the algorithm is inefficient for small n. Instead of going all the way to 1, I propose to use the introsort trick: if n is smaller than say 12, then use the following algorithm instead:

  • For each axis, project each red on the axis (only the edges) and sort them (ala introsort)
  • This defines a raster of nd zones, check that each is covered

Dimension agnostic

The algorithm is formally defined to be applicable in any given dimension with the same asymptotic complexity, in average O(n log n). This gives us the opportunity to test in dimension 1 to identify the pathological inputs.

Pathological input

Like qsort on which it is modelled it is sensible to factorial inputs. By factorial inputs I mean:

1.......6...9.11.13

whenever you pick the average of your interval, you have all the elements on one side of it.

With such an input even choosing the median of 3 is unlikely to yield a very good cut.

EDIT:

I am going to show the partition idea with a little scheme, as @lVlad noted it was kind of fuzzy.

+----------------+----+---------+
|       1        | 2  |   3     |
+----------------+----+---------+
|       8        | P  |   4     |
+----------------+----+---------+
|       7        | 6  |   5     |
+----------------+----+---------+

Okay, so the rectangle we check for coverage is now splitted into 9 subrectangles. We ignore P, it's our pivot.

Now, we would really like that each subrectangle is covered by less red than N. The pivot is chosen as a barycenter of the centers, thus it means if our "random" choice held true that there are about as many reds centers in each direction of the pivot.

It's kind of fuzzy there because some special configurations might make it so that there is little gain at one step (all rectangles have the same center and we just picked the smaller one for example), but it will create chaos and thus the following step will be different.

I am happy if anyone can formalize that, I am an engineer, not a computer scientist, and my maths lag behind...

Matthieu M.
I don't really get it. How exactly do you make sure that the subrectangle corresponding to each recursive call does not generate (or rarely generates) `O(n)` checks? Could you post some pseudocode?
IVlad
Okay I edited. It's obvious when we use the first partionning scheme (max area) since we check that no sub-problem has more than n/c inputs (with for example `c == 2`). However I must admit that though it should be possible to prove it thanks to the fact that `P` is chosen because of its centered position, it's kind of difficult to prove that the repartition will work. Probably because there are some cases (like fractal) where it does not.
Matthieu M.
Maybe it's me, but I don't really understand your method at all :). I get that you try to split the blue rectangle into subrectangles, but what do you do with those? You check if they are contained entirely within a blue rectangle, don't you? Do you have a way to do this in `O(N)` per LEVEL of recursion, rather than `O(N)` per RECURSIVE CALL? You talk about reducing the problem to 1d through projections, but I'm not sure how those would work either...
IVlad
Let's talk in 1 dimension. I have 1 `blue` segment and `N` `red` segments that are aligned and I wish to know whether or not the `red`s cover the `blue` in `O(N log N)` time. Clipping means that I discard any `red` bit that is outside of the `blue` segment (they don't participate in the coverage). I pick 3 `red` and select my pivot among them. I then partition `blue` into 3 subsegments (before pivot, pivot, after pivot) and recurse on the 2 extremes. Notice the parallel with `quick sort` here, that's what make me claim that the complexity should be akin to the one of `quick sort`.
Matthieu M.