ansaurus

Question

Progressive Connected Component Labeling

Answer 1

A:

Interesting problem! Here's my initial thought. Hopefully I'll have more and will update this answer as they come...

[Update 2] Since you only care about one group, the A* search seems ideal. Have you profiled the A* search vs. relabeling? I have to think that a well-written A* search would be faster than the flood fill. If not perhaps you can post your actual code for optimization help?

[Update 1] If you know the newly OFF cell C is in group G, then you could re-run the CCL algorithm but only have to re-label the cells in group G. The other ON cells could retain their existing labels. You would not have to examine the rest of the grid, which could be significant savings compared to the initial CCL of the entire grid. (As an avid Nurikabe solver myself, this should be at least a 33% savings in a solved puzzle, and a very significant savings in an in-progress puzzle, no? "33%" coming from my guesstimate that solved puzzles are about 2/3 black and 1/3 white.)

To do this you'd have to store a list of cells contained in each group so that you could quickly iterate over the cells in group G and re-label only those cells.

John Kugelman 2009-06-26 23:05:30

@John: An interesting thought, but since there is usually only one group, this change does very little. Alternatively, I could argue that my current algorithm already has this improvement implicitly, since, once it finds the first group, it gives up if it encounters an ON cell that is not in the group. The "stop if encountering a cell not in the group" could probably be optimized away...but it's so fast that it really doesn't matter.

Brian 2009-06-26 23:16:53

Updated... are you sure A* is no better than flood fill?

John Kugelman 2009-06-26 23:43:10

In re to A*: Switching to A* will probably reduce the number of squares that need to be scanned, but I am hesitant to even test it, as it gets annoying to do when there are a lot of changes. Your comments on savings do bring up the rather interesting idea of maintaining separate neighbor lists depending on the type of neighbor, thus yielding a faster traversal. That will probably bring about consistent speedups in several places all over the program.

Brian 2009-06-27 06:07:40

Answer 2

+1 A:

Not a complete solution, but here goes:

For each connected component keep a spanning tree in memory
- Tree property A: Our spanning tree has a notion of which node is "above" which (like in search trees). The choice of which is above which is arbitrary
Let's discuss removing and adding edges
When adding an edge:
- Check if the two nodes are in the same component by checking if their trees' roots are the same
  - Tree property B: The tree should be dense so this check would be O(log n)
- If in same group then do nothing
- If they are in different groups then join the trees with the new edge.
  - This would require to transform "the shape" (the definition of who is above who) of one of the trees so our new edge could be "above" it
When removing an edge:
- If this edge does not participate in a group's spanning tree then do nothing.
- If it does, we would need to check if the group is still connected
  - DFS from one group to try reach the other one
  - Better do it from the smaller of the two
    - Tree property C: We maintain for each node in the tree the size of its subtree
    - Using property C we can tell both groups' sizes
  - Because of property B: usually the smaller group will be very small and the larger group will be very large
  - If the groups are connected then we act as if we added the connecting edge
  - If the groups are not connected then we should climb the tree to maintain property C (subtract the size of the previously connected subtree from the ancestors' subtree sizes)
Problem: How do we maintain property B (the tree is dense)?

I hope this makes sense :)

yairchu 2009-06-27 01:00:07

Hmmm. The idea of using a spanning tree to store the components is one I had not considered. There are a lot of trade-offs associated with making this change. I'm not clear how to DFS from the smaller tree to the larger tree efficiently (without traversing both trees and without repeatedly doing an O(logn) check to see if a node belongs to the other component). If there is no way to do this, then the benefit of starting from the smaller tree is mostly lost.

Brian 2009-06-27 05:48:09

Answer 3

+1 A:

Hi,

this is the same problem as calculating (assuming 4-connectivity on a grid) connected strings of stones in the game of Go (Igo in Japan), and doing it incrementally is one key to a high-performance Go playing algorithm.

That being said, in this domain also the easy case is when you turn one grid element on (add a stone on the board) because then you can only join previously unconnected components. The problematic case is when you turn one grid element off (remove a stone because of an undo in the algorithm) as then a single component can get partitioned into two disconnected ones.

Based on my limited understanding on the problem, I would recommend that you would use union-find when you turn an element ON to merge the labeled groups, and you would recompute the related groups from scratch when you turn a grid element OFF. In order to optimize this, whenever you turn grid elements both ON and OFF, handle the OFF-case first so that the union-find operations are not wasted. If you want to have a more advanced incremental algorithm you can start to maintain incrementally connectivity data per element but it most likely won't pay off.

antti.huima 2009-06-27 04:15:43

ansaurus

tags:

views:

answers:

Progressive Connected Component Labeling

related questions