tags:

views:

154

answers:

3

There is a 2d array of items (in my case they are called Intersections).

A certain item is given as a start. The task is to find all items directly or indirectly connected to this item that satisfy a certain function.

So the basic algorithm is like this:

Add the start to the result list. Repeat until no modification: Add each item in the array that satisfies the function and touches any item in the result list to the result list.

My current implementation looks like this:

private IList<Intersection> SelectGroup (
    Intersection start,
    Func<Intersection, Intersection, bool> select)
{
    List<Intersection> result = new List<Intersection> ();

    Queue<Intersection> source = new Queue<Intersection> ();
    source.Enqueue (start);

    while (source.Any ()) {
        var s = source.Dequeue ();
        result.Add (s);

        foreach (var neighbour in Neighbours (s)) {
            if (select (start, neighbour)
                && !result.Contains (neighbour)
                && !source.Contains (neighbour)) {
                source.Enqueue (neighbour);
            }
        }
    }

    Debug.Assert (result.Distinct ().Count () == result.Count ());
    Debug.Assert (result.All (x => select (x, result.First ())));

    return result;
}

private List<Intersection> Neighbours (IIntersection intersection)
{
    int x = intersection.X;
    int y = intersection.Y;

    List<Intersection> list = new List<Intersection> ();

    if (x > 1) {
        list.Add (GetIntersection (x - 1, y));
    }
    if (y > 1) {
        list.Add (GetIntersection (x, y - 1));
    }
    if (x < Size) {
        list.Add (GetIntersection (x + 1, y));
    }
    if (y < Size) {
        list.Add (GetIntersection (x, y + 1));
    }

    return list;
}

(The select function takes a start item and returns true iff the second item satisfies.)

This does its job and turned out to be reasonable fast for the usual array sizes (about 20*20). However, I'm interested in further improvements. Any ideas?

Example (X satisfies in relation to other Xs, . does never satisfy):

....
XX..
.XX.
X...

In this case, there are 2 groups: a central group of 4 items and a group of a single item in the lower left. Selecting the group (for instance by starting item [3, 3]) returns the former, while the latter can be selected using the starting item and sole return value [1, 4].

Example 2:

.A..
..BB
A.AA

This time there are 4 groups. The 3 A groups are not connected, so they are returned as separate groups. The bigger A and B groups are connected, but A does not related to B so they are returned as separate groups.

+1  A: 

Step 1: trivial change for huge benefit
Simple, immediate improvement: Both your result.Contains and source.Contains memberships are on list types, so they'll be O(n) in the size of those lists, not very efficient. Since you really don't care about any particular ordering, I'd change both of those into HashSet for constant-time lookups.
Note that your current implementation would be O(n^2) in the worst case, which happens when the entire array is valid (by the time you're inserting the last few elements, you'd be checking each one against the entire rest of the grid).

Step 2: further optimization
Better structural change: Keep a boolean visited array of the same size as your Intersection array, and every time you look at an Intersection, mark it as visited. This way, you don't have to check if something's in result or source every time, and better yet, you don't have to re-evaluate your select predicate. Otherwise, given something like this:

XXX
X.X
XXX

you'd be evaluating select on the center dot four times, which could be bad if it's expensive. This way, your

if (select (start, neighbour)
  && !result.Contains (neighbour)
  && !source.Contains (neighbour))

condition turns into: if (!visited(neighbour) && select(start, neighbour), which will only evaluate select at most once on any given intersection.
Also, if you do this, you don't even need to make result and contains hashes any more, since you won't be doing containment-checks on them.

tzaman
Parallel arrays are bad. Rather than have two arrays of the same size that you have to keep synchronized, just make a class to pair the bool with your object and store that all in one array. Then you don't have to worry about keeping two arrays sync'd.
Joel
Good point, that'd work better.
tzaman
A: 

I am not really good at C# but reading your algorithm description, I can have a few suggestions for you:

1/ Do you know Dynamic programming or Bellman Algorithm? The idea is that for each iteration, you only get the newest result and continue searching. Dynamic Programming

For example :

1 st iteration : x1, x2

2 nd iteration : x3, x4

the third iteration, you only search and compare back with x3 and x4. It will save you a lot of CPU calculation and iteration

2/ Use HashSet to have a better searching and getting data for contains

vodkhang
I don't see how the approach in `1` reduces the overall time required?
mafutrct
I read the source carefully again. I am not good at C# but I think that this line can cause some performance problem: while (source.Any ()) . Because everytime, you enqueue the source, you loop over some element (s) again, is it right? So that, some element in source will be used to check many times and I thought that it is not necessary.
vodkhang
`Any` is highly optimized, it performs a simple and fast check if there is any element in the list (faster than, for instance, list.Count). Every element is only enqueued once at most, the algorithm guarantees that.
mafutrct
Ok,I got it. Thanks for that
vodkhang
A: 

Based on tzaman's answer, I used a HashSet. Since the queue does not support hashing, I found an improvement that only requires lookup in the result list instead of both result and source list:

HashSet<Intersection> result = new HashSet<Intersection> ();
result.Add (start); // need to add first item in advance

Queue<Intersection> source = new Queue<Intersection> ();
source.Enqueue (start);

while (source.Any ()) {
    var s = source.Dequeue ();

    foreach (var neighbour in Neighbours (s)) {
        if (select (start, neighbour) && !result.Contains (neighbour)) {
            result.Add (neighbour); // add immediately to hashset
            source.Enqueue (neighbour);
        }
    }
}
mafutrct