views:

303

answers:

3

I am working with a grid of squares which have two states, "ON" and "OFF." I have a rather simple Connected Component Labeling algorithm which finds all of the "ON" components. Usually, but not always, there is exactly one "ON" component.

I wish to construct an algorithm which takes in as input a matrix of on/off cells, a component labeling (probably formatted as a list of hashsets of cells), and a list of cells that have changed since the labeling was formed, and output a new labeling. The obvious solution is to just recalculate from scratch, though this is not efficient. In general the list of cells that have changed will be small.

In the case of the change list being only cells that have turned ON, this is easy to do:

Groups G;
Foreach changed cell C:
  Group U = emptygroup;
  U.add(C);
  Foreach Group S in G:
    if (S contains a cell which is adjacent to C)
      G.Remove(S);
      U.UnionWith(S);
  G.add(C);

However, if the changes contain any cells which have turned off, I am not sure what to do. Keep in mind that all ON cells must be a member of exactly one group. So, one solution would be to take each cell that is adjacent to the newly OFF cell and see if they are connected to one another (e.g. using * pathfinding). This will yield 1-4 contiguous groups (unless the cell was the only cell in its group and thus has 0 adjacent cells to check, in which case it yields 0 groups). However, this is only a little better than starting from scratch since usually (but not always) connecting these adjacent squares together is about as hard as just finding a contiguous group (unless someone has a suggestion for a smart way to do that). Also, it is a bit scary to do if there are a lot of changed cells...though I admit there usually aren't.

Context, for those who insist on knowing why I am doing this: One rule in Nurikabe puzzles is that you may only have 1 contiguous group of walls. A simplficiation of a problem I am trying to solve to gain increased speed (and to play with pathfinding) is above. Basically, I wish to check for contiguous walls without wasting the information gained from previous such tests. I am trying to see how many places in my solver I can make use of previous information in order to enhance speed, since it seems kind of painful to use an O(f(N)) algorithm when an O(f(Δ)) algorithm will suffice (N being the size of the puzzle and Δ being the changes made since the algorithm was run last).

Profiling does indicate that improving this algorithm will make a difference to execution time, but this is a project for fun rather than for profit so it doesn't really matter that much, except to being able to measure whether the change had any influence.

Note: I have omitted explaining my current algorithm, but it basically works by doing a stack-based Flood Fill algorithm on the first ON square it finds, then checks to see if there are any more ON squares (which means there is more than one group, which it does not bother to examine).

Edit: Enhancement idea: Yairchu and John Kugelman's' suggestions crystalized in my head into this improvement, which is not actually a solution to this problem per se, but may make this part of the code and several other pieces of code run faster:

Current loop:

foreach (Square s in m.Neighbors[tmp.X][tmp.Y])    
{
    if (0 != ((byte)(s.RoomType) & match) && Retval.Add(s)) curStack.Push(s);
}

Improvement idea:

foreach (Square s in m.NeighborsXX[tmp.X][tmp.Y])    
{
    if (Retval.Add(s)) curStack.Push(s);
}

This would require maintaining several m.NeighborsXX instances (one for each type of match that needs to be enhanced) and updated them all whenever a square changed. I'd need to benchmark this to see if it actually helped, but it looks like a standard case of trading some memory for some speed.

A: 

Interesting problem! Here's my initial thought. Hopefully I'll have more and will update this answer as they come...

[Update 2] Since you only care about one group, the A* search seems ideal. Have you profiled the A* search vs. relabeling? I have to think that a well-written A* search would be faster than the flood fill. If not perhaps you can post your actual code for optimization help?

[Update 1] If you know the newly OFF cell C is in group G, then you could re-run the CCL algorithm but only have to re-label the cells in group G. The other ON cells could retain their existing labels. You would not have to examine the rest of the grid, which could be significant savings compared to the initial CCL of the entire grid. (As an avid Nurikabe solver myself, this should be at least a 33% savings in a solved puzzle, and a very significant savings in an in-progress puzzle, no? "33%" coming from my guesstimate that solved puzzles are about 2/3 black and 1/3 white.)

To do this you'd have to store a list of cells contained in each group so that you could quickly iterate over the cells in group G and re-label only those cells.

John Kugelman
@John: An interesting thought, but since there is usually only one group, this change does very little. Alternatively, I could argue that my current algorithm already has this improvement implicitly, since, once it finds the first group, it gives up if it encounters an ON cell that is not in the group. The "stop if encountering a cell not in the group" could probably be optimized away...but it's so fast that it really doesn't matter.
Brian
Updated... are you sure A* is no better than flood fill?
John Kugelman
In re to A*: Switching to A* will probably reduce the number of squares that need to be scanned, but I am hesitant to even test it, as it gets annoying to do when there are a lot of changes. Your comments on savings do bring up the rather interesting idea of maintaining separate neighbor lists depending on the type of neighbor, thus yielding a faster traversal. That will probably bring about consistent speedups in several places all over the program.
Brian
+1  A: 

Not a complete solution, but here goes:

  • For each connected component keep a spanning tree in memory
    • Tree property A: Our spanning tree has a notion of which node is "above" which (like in search trees). The choice of which is above which is arbitrary
  • Let's discuss removing and adding edges
  • When adding an edge:
    • Check if the two nodes are in the same component by checking if their trees' roots are the same
      • Tree property B: The tree should be dense so this check would be O(log n)
    • If in same group then do nothing
    • If they are in different groups then join the trees with the new edge.
      • This would require to transform "the shape" (the definition of who is above who) of one of the trees so our new edge could be "above" it
  • When removing an edge:
    • If this edge does not participate in a group's spanning tree then do nothing.
    • If it does, we would need to check if the group is still connected
      • DFS from one group to try reach the other one
      • Better do it from the smaller of the two
        • Tree property C: We maintain for each node in the tree the size of its subtree
        • Using property C we can tell both groups' sizes
      • Because of property B: usually the smaller group will be very small and the larger group will be very large
      • If the groups are connected then we act as if we added the connecting edge
      • If the groups are not connected then we should climb the tree to maintain property C (subtract the size of the previously connected subtree from the ancestors' subtree sizes)
  • Problem: How do we maintain property B (the tree is dense)?

I hope this makes sense :)

yairchu
Hmmm. The idea of using a spanning tree to store the components is one I had not considered. There are a lot of trade-offs associated with making this change. I'm not clear how to DFS from the smaller tree to the larger tree efficiently (without traversing both trees and without repeatedly doing an O(logn) check to see if a node belongs to the other component). If there is no way to do this, then the benefit of starting from the smaller tree is mostly lost.
Brian
+1  A: 

Hi,

this is the same problem as calculating (assuming 4-connectivity on a grid) connected strings of stones in the game of Go (Igo in Japan), and doing it incrementally is one key to a high-performance Go playing algorithm.

That being said, in this domain also the easy case is when you turn one grid element on (add a stone on the board) because then you can only join previously unconnected components. The problematic case is when you turn one grid element off (remove a stone because of an undo in the algorithm) as then a single component can get partitioned into two disconnected ones.

Based on my limited understanding on the problem, I would recommend that you would use union-find when you turn an element ON to merge the labeled groups, and you would recompute the related groups from scratch when you turn a grid element OFF. In order to optimize this, whenever you turn grid elements both ON and OFF, handle the OFF-case first so that the union-find operations are not wasted. If you want to have a more advanced incremental algorithm you can start to maintain incrementally connectivity data per element but it most likely won't pay off.

antti.huima