views:

507

answers:

8

The problem

I have an array of java.awt.Rectangles. For those who are not familiar with this class, the important piece of information is that they provide an .intersects(Rectangle b) function.

I would like to write a function that takes this array of Rectangles, and breaks it up into groups of connected rectangles.

Lets say for example, that these are my rectangles (constructor takes the arguments x, y, width,height):

Rectangle[] rects = new Rectangle[]
{
    new Rectangle(0, 0, 4, 2), //A
    new Rectangle(1, 1, 2, 4), //B
    new Rectangle(0, 4, 8, 2), //C
    new Rectangle(6, 0, 2, 2) //D
}

A quick drawing shows that A intersects B and B intersects C. D intersects nothing. A tediously drawn piece of ascii art does the job too:

┌───────┐   ╔═══╗
│A╔═══╗ │   ║ D ║
└─╫───╫─┘   ╚═══╝
  ║ B ║                 
┌─╫───╫─────────┐
│ ╚═══╝ C       │
└───────────────┘

Therefore, the output of my function should be:

new Rectangle[][]{
    new Rectangle[] {A,B,C},
    new Rectangle[] {D}
}

The failed code

This was my attempt at solving the problem:

public List<Rectangle> getIntersections(ArrayList<Rectangle> list, Rectangle r)
{
    List<Rectangle> intersections = new ArrayList<Rectangle>();
    for(Rectangle rect : list)
    {

        if(r.intersects(rect))
        {
            list.remove(rect);
            intersections.add(rect);
            intersections.addAll(getIntersections(list, rect));
        }
    }
    return intersections;
}

public List<List<Rectangle>> mergeIntersectingRects(Rectangle... rectArray)
{
    List<Rectangle> allRects = new ArrayList<Rectangle>(rectArray);
    List<List<Rectangle>> groups = new ArrayList<ArrayList<Rectangle>>();
    for(Rectangle rect : allRects)
    {
        allRects.remove(rect);
        ArrayList<Rectangle> group = getIntersections(allRects, rect);
        group.add(rect);
        groups.add(group);
    }
    return groups;
}

Unfortunately, there seems to be an infinite recursion loop going on here. My uneducated guess would be that java does not like me doing this:

for(Rectangle rect : allRects)
{
    allRects.remove(rect);
    //...
}

Can anyone shed some light on the issue?

+1  A: 

I'm not up on my java foo, but I guess the problem is that you're removing items from your list while iterating the list. Depending on the implementation of the container type this can have big problems. Someone with more Java knowledge may be able to confirm or deny this.

This SO Question seems to confirm my suspicions.

After googling a bit it seems that java iterators support a remove method, so instead of

allRects.remove(rect);

you should use an iterator and then use

rect_i.remove();

and the same for

list.remove(rect);

Although I think this will still get you in trouble since you are modifying the same list at a lower level in the call stack.

My version:

ArrayList<Rectangle> rects = new ArrayList<Rectangle>(rectArray);
ArrayList<ArrayList<Rectangle>> groups = new ArrayList<ArrayList<Rectangle>>();
while (!rects.isEmpty)
{
    ArrayList<Rectangle> group = new ArrayList<Rectangle>();
    ArrayList<Rectangle> queue = new ArrayList<Rectangle>();
    queue.add(rects.remove(0));
    while (!queue.isEmpty)
    {
        rect_0 = queue.remove(0);
        rect_i = rects.iterator();
        while (rect_i.hasNext())
        {
            Rectangle rect_1 = rect_i.next();
            if (rect_0.intersects(rect_1))
            {
                queue.add(rect_1);
                rect_i.remove();
            }
        }
        group.add(rect_0);
    }
    groups.add(group);
}

Note: I think the code's correct now, but I wrote this up just from reference docs and I'm no Java coder, so you may need to tweak.

As an aside, this type of naive algorithm is fine if you have a small list of rectangles that you need to check, but if you want to do this for very large lists, then you will want to use a much more efficient algorithm. This naive algorithm is O(n^2), a smarter algorithm that first sorts all rectangle corners lexicographically and then performs a plane sweep and does range intersection checking on the sweep line would result in a relatively straightforward O(n log n) algorithm.

wich
`rect` itself is not an iterator, it's a rectangle. The iterator is kind of "behind the scenes" when you use the so called "for each" syntax. But you could (I think) expose the iterator explicitly with all its (IMHO relatively ugly) hasNext() and getNext() methods and such.
MatrixFrog
ah, thanks, normally I don't touch Java with a 10 foot pole, so have to look everything up... Will update my answer.
wich
Your call to `groups.add(group)` should be `groups.addAll(group)`. Moreover, using an `ArrayList` of `ArrayList`s to store connected components is kind of clunky. It may be better to use an `ArrayList` of `Set` s, or even a `Set` of `Set`s, since order doesn't appear to be important.
ntownsend
@ntownsend no it shouldn't be addAll, that would result in groups ending up to be the list we started with, I want to add the list, I don't want to add all elements in the list. As for the whole ArrayList deal, I kept it as the OP had it, probably LinkedLists would be better here given that elements will be removed pretty much in random order from the list.
wich
@wich Whoops! You're totally right. I gapped it on that one.LinkedLists are also a good call.
ntownsend
A: 

You cannot remove an object from a list your iterating through, iterator object or not, you will need to find another way

walnutmon
That's not correct, iterators have a remove method, that is, if the container supports it.
wich
+1  A: 

(this is too long for a comment)

A quick drawing does NOT show that A intersects B: A has an height of 4 while B start at an Y position of 5, how could they intersect!?

You can verify it with the following that prints out 'false':

System.out.println( new Rectangle(0, 0, 2, 4).intersects( new Rectangle(1, 5, 4, 2) ) );

Then your method signature are incomplete so is your code example.

If you clarify a bit your problem and give a working, correct, example, then I've got a very nice solution for you.

Webinator
Oops. You're absolutely right. I'll edit my example.
Eric
The algorithm should work regardless of the input example being flawed, right now the algorithm is incorrect, so even if A would be made to intersect B it would still fail.
wich
Exactly what do you mean by my method signatures are incoreect? I also can't see why the algorithm is incorrect. It seems like it should work. The main issue is that I can't trace the source of the infinite loop.
Eric
@Eric your mergeIntersectingRects() says it returns a Rectangle[] (array of rectangles) but what you want is something more like the local variable `groups`: A list of lists of rectangles. (Or, like in my answer, maybe a List of Sets of rectangles, or something along those lines.) When you cast `groups` to Rectangle[], I'm not sure what that will do.
MatrixFrog
@Eric, you cannot remove elements from a list while iterating over it using a for each, see http://stackoverflow.com/questions/1196586/calling-remove-in-foreach-loop-in-java
wich
That cast wasn't me. Someone else must have edited it. Thanks for spotting that the dimensions didn't match. I guess I'll change it to List<List<Rectangle>>, as converting to Rectangle[][] is more hassle than it's worth.
Eric
+2  A: 

Alright, I think I got it. This algorithm is rather inefficient, O(n^3) by wich's calculation, but it does seem to work.

I used Set instead of List in getIntersections() to keep from counting the same rectangle twice (although I don't think this is actually necessary). I guess your final result could even be a Set<Set<Rectangle>> but the algorithm should be about the same. I also used Lists everywhere instead of arrays because I think arrays are ugly, but it's easy enough to convert back if you need to. The set newRectanglesToBeAdded lets us decide whether we need to keep looping or not, and also keeps us from adding to a list while we're iterating over it (which is just as bad as trying to remove things from a list while we're iterating over it). I don't think it's the most elegant solution, but it seems to work (at least for the test data you provided).

  public static Set<Rectangle> getIntersections(List<Rectangle> list,
      Rectangle r) {
    Set<Rectangle> intersections = new HashSet<Rectangle>();
    intersections.add(r);

    Set<Rectangle> newIntersectionsToBeAdded = new HashSet<Rectangle>();

    do {
      newIntersectionsToBeAdded.clear();
      for (Rectangle r1 : list) {
        for (Rectangle r2 : intersections) {
          if (!intersections.contains(r1) && r2.intersects(r1)) {
            newIntersectionsToBeAdded.add(r1);
          }
        }
      }
      intersections.addAll(newIntersectionsToBeAdded);
    } while (!newIntersectionsToBeAdded.isEmpty());
    return intersections;
  }

  public static List<Set<Rectangle>> mergeIntersectingRects(List<Rectangle> allRects) {
    List<Set<Rectangle>> grouped = new ArrayList<Set<Rectangle>>();
    while (!allRects.isEmpty()) {
      Set<Rectangle> intersections = getIntersections(allRects, allRects.get(0));
      grouped.add(intersections);
      allRects.removeAll(intersections);
    }
    return grouped;
  }
MatrixFrog
If i see it right this algorithm will be O(n^3)... probably not a good idea. One of two worst case examples; a list of n rectangle, where each rectangle r_i intersects with r_{i-1} and r_{i+1} but not any others. Then the `do ... while` will add only one rectangle in each iteration, i.e. n iterations. `for (r1)` always runs over the whole list, so n iterations, `for (r2)` runs over the selection up to now, ranging from 0 to n, or n/2 on average. The `while (!empty)` will thankfully only run once. This leads to O(n * n * n/2) = O(n^3).
wich
Unfortunately, I'm stuck with lists: the version of java that I'm using (LeJOS NXJ) is rather a limited subset, and consequently, does not include the `Set` class.However, I'm sure that I can get around that. Thanks. Any idea what the flaw is in my algorithm?
Eric
Other than the "can't modify a list while you iterate over it" issue that people have mentioned, it seems fine. The set/list thing shouldn't be an issue, as long as you have some way of ensuring you don't add any duplicates (for one thing, because you don't want duplicates, and for another, because that would cause an infinite loop if your algorithm is roughly similar to mine).
MatrixFrog
+2  A: 

What you want is to find connected components. That is, imagine a graph whose vertices correspond to rectangles, and where there is an edge between two vertices if the corresponding rectangles intersect. Then, you want to find and label the connected components of this graph.

Just finding the edges (determining, for each pair of rectangles, whether they intersect) takes O(n2) time, after which you can use either depth-first search or breadth-first search to find all the components in an additional O(E) time, where E < n2.

In pseudocode (simple exercise to translate it to Java), it may look something like this:

# r is the list of rectangles
n = length of r (number of rectangles)

#Determine "neighbors" of each vertex
neighbors = (array of n lists, initially empty)
for i in 1 to n:
    for j in 1 to n:
        if i!=j and r[i].intersects(r[j]):
            neighbors[i].append(j)

#Now find the connected components
components = (empty list of lists)
done = (array of n "False"s)
for i in 1 to n:
    if done[i]: continue
    curComponent = (empty list)
    queue = (list containing just i)
    while queue not empty:
        r = pop(head of queue)
        for s in neighbors[r]:
            if not done[s]:
                done[s] = True
                queue.push(s)
                curComponent.push(s)
    #Everything connected to i has been found
    components.push(curComponent)

return components

I'm precomputing neighbors and using "done" labels to save the O(n) factor and make the whole thing O(n2). In fact, this algorithm is for general graphs, but because your graph is rather special — comes from rectangles — you can do even better: it is actually possible to solve the problem in O(n log n) time total, using segment trees.

ShreevatsaR
I think the claimed `O(n log n )` algorithm is probably not true. I think its more like `O( n^2 log n )`. I think your reference to segment trees is equivalent to an efficient sweep line algorithm (which uses R-B trees) and sweep line algorithm in this case would run O( n^2 log n ) because it has to stop at all intersection of which there may be O(n^2) of. However, even so the expected run time of a sweep line algorithm is better than the O( n^2 ) approach.
ldog
An O(n log n) algorithm is definately possible, Imai and Asano showed an O(n log n) algorithm already in 1983: http://dx.doi.org/10.1016/0196-6774(83)90012-3Also a sweep line algorithm doesn't have to stop at all intersections, it needs to step at each point the sweep line changes, which is either all west and east sides of the rectangles or all north-south sides of the rectangles, depending on whether one sweeps over x or y. Which naturally leads to 2*n sweep events. The trick is to handle all events in O(log n) time.
wich
The paper says that they find the connected components by finding the graph G=(V,E) where the set of vertices V are the rectangles and the edge (u,v) iff rectangle u and v intersect. Such a graph has possibly O(n^2) edges (where n is the number of rectangles), so unless they represent their output differently their claim can not be true.
ldog
since just writing their output would require O(n^2) time. Most sweep algorithms that deal with intersections have run times of O( k log n) where k is the number of intersections (edges in G), which are still a big improvement on the slow O(n^2) algorithm since k is "usually" O(n). In some sense you can show the expected time is O( n log n) but not the worst case time.
ldog
@gmatt: You shouldn't be so skeptical. :-) It is true that if you stop at each intersection point, or if you write out the entire intersection graph, you need Ω(n^2), but the trick is to *not* do that. The authors do indeed solve the problem in O(n log n) **worst case**. (I haven't read the papers carefully, but consider the simpler problem of finding, say, the area of the union: again there are O(n^2) intersection points, but with an interval/segment tree, you just sweep left-to-right, and add/remove each vertical segment (when you enter or exit a rectangle) in O(log n), for O(n log n) total.
ShreevatsaR
Didn't have space to link to papers available online: see http://www.cs.duke.edu/~edels/Papers/1984-J-01-ConnectedComponents.pdf by Edelsbrunner et al. which is cited for the O(n log n) algorithm, and also this paper that can do it O(n log u / log log u) where u is the size of the universe (largest numbers in input): http://www.cs.uu.nl/research/techreps/repo/CS-1986/1986-18.pdf
ShreevatsaR
ShreevatsaR problem with such a sweep algorithm is not the inserting and deleting the vertical segments, but that, to join all intersecting groups, you need to find all intervals in the interval tree that the newly inserted interval intersects with, which is a range query between y_low and y_high. That however is a O(log n + m) operation, with m the number of reported intersections, which in our case could be O(n). Such an algorithm would result in an O(n log n + k) algorithm, with k the number of intersections, which gmatt is referring to.
wich
However the point is that once a group of intervals is joined one doesn't have to report them all every time a new segment intersects them, as soon as you know the new segments intersects one of them, you can forget about the rest, the trick is to do this the right way.
wich
wich you correctly understood my point about O( n log n + k) where k is the number of intersections. Your point if I understand it correctly is that once you determine that two or more rectangles intersect you can treat them as one unit ( or connected component ) which would simplify that graph I was alluding to earlier. I think in this case you will be transforming the graph (grouping multiple nodes to become one node) and so you will have to relink existing edges in the so the overall number of operations won't change. I will read those papers ShreevatsaR posted and see what they say.
ldog
You don't relink anything since that graph doesn't exist at all, it's merely an implied graph. There is no physical representation of the intersection graph at any point in time.
wich
A: 

Connected components.

Alternatively, because you only have rectangles you may be able to design an very efficient sweep line algorithm.

I would expect the best algorithm to take at least O( n^2 ) time because given n rectangles there are O( n^2 ) possible intersections, and any algorithm to compute what you want would need to consider all intersections at some point.

ldog
O(n log n) is definately possible with a plane sweep algorithm, I'm working on an implementation in C++, which I'll post tomorrow.
wich
I look forward to that algorithm, if it exists.
ldog
Still struggling to manage the sweep line structure in O(log n) time. I have one edge case that degenerates my structure into a linear depth tree, which, funnily enough is not the O(n^2) intersections case, but a series of nested Us. I will persevere though, way too much fun to do this project.
wich
As for the possibility of an O(n log n) algorithm, Imai and Asano showed an O(n log n) algorithm already in 1983: http://dx.doi.org/10.1016/0196-6774(83)90012-3The point is that that the rectangles are orthogonal so one can use the sorted x and y coordinates in a smarter way.
wich
that paper is interesting because they say they can find the graph in O( n log n ) time but that graph can have O( n^2) edges, which implies they do not store the graph as G=(V,E) otherwise their O(n log n) time is broken since E=O(n^2). In that case, I suspect their algorithm is incomplete, I wish I could read the paper.
ldog
I think they meant to write that the algorithm is O( k logn n) where k is the number of intersections, that is much more believable for me.
ldog
Nope, it is definitely O(n log n) with n the number of rectangles, you don't need to concern yourself at all with the number of intersections, you don't check them all and you don't report them all, you only report the connected components.
wich
+1  A: 

If you desire an O(n log n) algorithm, one was shown by Imai and Asano in Finding the connected components and a maximum clique of an intersection graph of rectangles in the plane.

Note: I'm still working on my own plane sweep algorithm to find the set in O(n log n) time.

wich
A: 

This is the solution I went for in the end. Can anyone take a guess to its efficiency?

package java.util;

import java.awt.Rectangle;
import java.util.ArrayList;
import java.util.List;

public class RectGroup extends ArrayList<Rectangle> implements List<Rectangle>
{
    public RectGroup(Rectangle... rects)
    {
            super(rects);
    }

    public RectGroup()
    {
        super();
    }

    public boolean intersects(Rectangle rect)
    {
        for(Rectangle r : this)
            if(rect.intersects(r))
                return true;

        return false;
    }

    public List<RectGroup> getDistinctGroups()
    {
        List<RectGroup> groups = new ArrayList<RectGroup>();
        // Create a list of groups to hold grouped rectangles.

        for(Rectangle newRect : this)
        {
            List<RectGroup> newGroupings = new ArrayList<RectGroup>();
            // Create a list of groups that the current rectangle intersects.

            for(RectGroup group : groups)
                if(group.intersects(newRect))
                    newGroupings.add(group);
            // Find all intersecting groups

            RectGroup newGroup = new RectGroup(newRect);
            // Create a new group

            for(List<Rectangle> oldGroup : newGroupings)
            {
                groups.remove(oldGroup);
                newGroup.addAll(oldGroup);
            }
            // And merge all the intersecting groups into it

            groups.add(newGroup);
            // Add it to the original list of groups
        }
        return groups;
    }
}
Eric