views:

103

answers:

3

Hi guys,

Currently, I am working on a project that is trying to group 3d points from a dataset by specifying connectivity as a minimum euclidean distance. My algorithm right now is simply a 3d adaptation of the naive flood fill.

size_t PointSegmenter::growRegion(size_t & seed, size_t segNumber) {
    size_t numPointsLabeled = 0;

    //alias for points to avoid retyping
    vector<Point3d> & points = _img.points;
    deque<size_t> ptQueue;
    ptQueue.push_back(seed);
    points[seed].setLabel(segNumber);
    while (!ptQueue.empty()) {
        size_t currentIdx = ptQueue.front();
        ptQueue.pop_front();
        points[currentIdx].setLabel(segNumber);
        numPointsLabeled++;
        vector<int> newPoints = _img.queryRadius(currentIdx, SEGMENT_MAX_DISTANCE, MATCH_ACCURACY);
        for (int i = 0; i < (int)newPoints.size(); i++) {
            int newIdx = newPoints[i];
            Point3d &newPoint = points[newIdx];
            if(!newPoint.labeled()) {
                newPoint.setLabel(segNumber);
                ptQueue.push_back(newIdx);
            }
        }
    }

    //NOTE to whoever wrote the other code, the compiler optimizes i++ 
    //to ++i in cases like these, so please don't change them just for speed :)
    for (size_t i = seed; i < points.size(); i++) {
        if(!points[i].labeled()) {
            //search for an unlabeled point to serve as the next seed.
            seed = i;
            return numPointsLabeled;
        }
    }
    return numPointsLabeled;
}

Where this code snippet is ran again for the new seed, and _img.queryRadius() is a fixed radius search with the ANN library:

vector<int> Image::queryRadius(size_t index, double range, double epsilon) {
    int k = kdTree->annkFRSearch(dataPts[index], range*range, 0);
    ANNidxArray nnIdx = new ANNidx[k];
    kdTree->annkFRSearch(dataPts[index], range*range, k, nnIdx);
    vector<int> outPoints;
    outPoints.reserve(k);
    for(int i = 0; i < k; i++) {
        outPoints.push_back(nnIdx[i]);
    }
    delete[] nnIdx;
    return outPoints;
}

My problem with this code is that it runs waaaaaaaaaaaaaaaay too slow for large datasets. If I'm not mistaken, this code will do a search for every single point, and the searches are O(NlogN), giving this a time complexity of (N^2*log(N)).

In addition to that, deletions are relatively expensive if I remember right from KD trees, but also not deleting points creates problems in that each point can be searched hundreds of times, by every neighbor close to it.

So my question is, is there a better way to do this? Especially in a way that will grow linearly with the dataset?

Thanks for any help you may be able to provide

EDIT I have tried using a simple sorted list like dash-tom-bang said, but the result was even slower than what I was using before. I'm not sure if it was the implementation, or it was just simply too slow to iterate through every point and check euclidean distance (even when just using squared distance.

Is there any other ideas people may have? I'm honestly stumped right now.

+2  A: 

When I did something along these lines, I chose an "origin" outside of the dataset somewhere and sorted all of the points by their distance to that origin. Then I had a much smaller set of points to choose from at each step, and I only had to go through the "onion skin" region around the point being considered. You would check neighboring points until the distance to the closest point is less than the width of the range you're checking.

While that worked well for me, a similar version of that can be achieved by sorting all of your points along one axis (which would represent the "origin" being infinitely far away) and then just checking points again until your "search width" exceeds the distance to the closest point so far found.

dash-tom-bang
That's a really nice trick, thanks! How did you organize your dataset though?
Xzhsh
I think at the time I stuffed it all into a map, but just a sorted vector (std::sort with an appropriate functor) would do the trick too if you weren't going to be changing your dataset frequently.
dash-tom-bang
+2  A: 

Points should be better organized. To search more efficiently instead of a vector<Point3d> you need some sort of a hash map where hash collision implies that two points are close to each other (so you use hash collisions to your advantage). You can for instance divide the space into cubes of size equal to SEGMENT_MAX_DISTANCE, and use a hash function that returns a triplet of ints instead of just an int, where each part of a triplet is calculated as point.<corresponding_dimension> / SEGMENT_MAX_DISTANCE.

Now for each point in this new set you search only for points in the same cube, and in adjacent cubes of space. This greatly reduces the search space.

Dialecticus
Would this be somewhat similar to a griding solution?
Xzhsh
Yes, you group points by their position in 3D grid. The purpose of described hash map is that if you have a location of one grid cell then you immediately have all points it this cell, and also have all adjacent cells. And if you have a point then you immediately know its grid cell.
Dialecticus
Maybe I should point out that my answer is not another solution, but an improvement of your current solution. Because _img.points is better organized _img.queryRadius should be changed to use that to query only those points that are good candidates. If you don't feel like changing _img.points from vector<Point3d> to the proposed structure you can still use the vector but with points grouped by grid, and use a structure just to store boundary data for each segment within a vector. I could write some example code if you want.
Dialecticus
... however I would need the code for kdTree->annkFRSearch then.
Dialecticus
+3  A: 
ybungalobill
+1. I like this method. Use http://www.cgal.org/ for step 1.
Alexandre C.
@Alexandre: I looked at cgal but couldn't find the implementation complexity. Do you have a link?
ybungalobill
Alexandre C.
pretty cool solution, definitely gave me anew perspective on this problem. thanks!
Xzhsh