views:

91

answers:

4

hi

This is related to this question

I have a vector of points that will, for instance, store 100K+ points.

std::vector<Point> point_vec; 

I want to check if a position (x, y, z) to be added is already in point_vec (represented by an instance of class Point ). The following function will check this (in a for loop)

bool samePoints(const Point& p1, 
                const double x1, 
                const double y1, 
                const double z1) {

     return fabs(p1.x - x1) < EPSILON &&
            fabs(p1.y - y1) < EPSILON &&
            fabs(p1.z - z1) < EPSILON;
}

But, I guess checking if x in list/ vector will be a costly operation. I want to know if there is a better way to check if a) Points are equal (perhaps an operator "=" on class Point?? and (b) Is there some other , better data structure than `vector' that I should make use of. OR if there is an operation on vector that will speed up checking for 'same points'

For (b) please note that I need to std::sort these points as well. So any other data structure that you may suggest should be able to sort these points.

UPDATE

I want the points in the sorted state only. It is just that vector doesn't sort those (so I need to perform a sort operation. Should I use std::set<Point> point_set instead of instead of std::vector <Point> point_vec ? IF SO: Is adding each point to a set a costly operation ? Or 'vector' sorting that I do in the end turns out costlier overall?

+2  A: 

Is the collection of points always sorted or must it sometimes be in an unsorted state? If the former, a std:;set of points will be fast to determine if a point is already there, and will maintain the points in sorted order. You would have to supply a suitable ordering function (not an equality one), which may be faster than your equality test.

anon
@Neil: hi! The collection of points is *not* in the sorted state from the beginning..But, finally, I do want all points to be 'sorted' Thus, I don't really need 'unsorted' points. so do you recomment using `std::set <Point> point_set` instead of `std::vector <Point> point_vec` in the first place?
memC
@memC If sorting and lookup speed is the most important thing yes. but sets don't support other operations, such as random access using op[] that you may need. An alternative is to maintain the vector in sorted order and to use binary search to check if a particular point is there.
anon
@Neil: sorry but what did you mean by a 'suitable ordering function'?
memC
@memC You need to provide the equivalent of operator <() for your point class (or in fact provide operator<()) - it should return true if one point is in a sense less than another one, false otherwise.
anon
@Neil: I see. Thank you. BTW, if I am to use vector and if I just need to find out minimum value of the 3D point, is there something in STL similar to `min_element(vec.begin(), vec.end())` that will allow me to pass a functor as a 3rd argument (like done for `std::sort` ??
memC
@memC The std::find_if algorithm does this, but it will simply perform a linear search. If you have sorted your vector, you could use std::binary_search or std::equal_range. But obviously, if you have sorted, the min element will be the first :-)
anon
A: 

std::set sorts the items internally. You would only need to supply compare function.

Vlad
A: 

An STL set or hash_set might help. Building up the structure will be slower than building up a vector or list, but looking up points will be much faster.

Alternatively, you could also use a quad-tree or octo-tree.

Patrick
Note: hash_set will only work if your points have integer coordinates.
Patrick
@Neil, I actually meant sets and hash_sets, not maps (I was a bit too fast on this).
Patrick
+1  A: 

I'm afraid that while Niel and Vlad have the right idea, they've left out a particularly important detail: You can't sensibly order N-dimensional points with a 1-dimensional comparator if you want to be correct only to within epsilon instead of exactly correct.

Patrick is on target with quad/oct-trees, but not with hashing--hashing implicitly requires a mapping where all items close in 3D must be mapped close in 1D (along the hash), which is provably impossible.

So, assuming that you can start with a set sorted linearly using a comparator like

x1<x2 || (x1==x2 && (y1<y2 || (y1==y2 && z1<z2))

here's what you need to do:

  • Take your vector and subtract epsilon along every dimension. Do a binary search for the index of that element (if any). Call that i0
  • Take your vector and add epsilon along every dimension. Do a binary search for that index. Call that i1
  • Search everything between i0 and i1 linearly to see if any of those are actually within epsilon of your vector in every dimension (not just in the 1D sort direction).

If your points are randomly distributed, this will tend to be faster than constructing an oct-tree. If your points tend to fall along lines (e.g. x1==x2, y1==y2 for many z), then this scheme will often pick out huge chunks of the list as indistinguishable using a linear sort, and you should instead use an oct-tree for fast searching.

Rex Kerr