tags:

views:

914

answers:

19

Using C++ (and Qt), I need to process a big amount of 3D coordinates.

Specifically, when I receive a 3D coordinate (made of 3 doubles), I need to check in a list if this coordinate has already been processed. If not, then I process it and add it to the list (or container).

The amount of coordinates can become very big, so I need to store the processed coordinates in a container which will ensure that checking if a 3D coordinate is already contained in the container is fast.

I was thinking of using a map of a map of a map, storing the x coordinate, then the y coordinate then the z coordinate, but this makes it quite tedious to use, so I'm actually hoping there is a much better way to do it that I cannot think of.

+1  A: 

Well, it depends on what's most important... if a tripple map is too tedious to use, then is implementing other data structures not worth the effort?

If you want to get around the uglyness of the tripple map solution, just wrap it up in another container class with an access function with three parameter, and hide all the messing around with maps internally in that.

If you're more worried about the runtime performance of this thing, storing the coordinates in an Octree might be a good idea.

Also worth mentioning is that doing these sorts of things with floats or doubles you should be very careful about precision -- if (0, 0, 0.01) the same coordinate as (0, 0, 0.01000001)? If it is, you'll need to look at the comparison functions you use, regardless of the data structure. That also depends on the source of your coordinates I guess.

slicedlime
+1  A: 

Are you expecting/requiring exact matches? These might be hard to enforce with doubles. For example, if you have processed (1.0, 1.0, 1.0) and you then receive (0.9999999999999, 1.0, 1.0) would you consider it the same? If so, you will need to either apply some kind of approximation or else define error bounds.

However, to answer the question itself: the first method that comes to mind is to create a single index (either a string or a bitstring, depending how readable you want things to be). For example, create the string "(1.0,1.0,1.0)" and use that as the key to your map. This will make it easy to look up the map, keeps the code readable (and also lets you easily dump the contents of the map for debugging purposes) and gives you reasonable performance. If you need much faster performance you could use a hashing algorithm to combine the three coordinates numerically without going via a string.

Leigh Caldwell
+1  A: 

How about using a boost::tuple for the coordinates, and storing the tuple as the index for the map?

(You may also need to do the divide-by-epsilon idea from this answer.)

Head Geek
A: 

Pick a constant to scale the coordinates by so that 1 unit describes an acceptably small box and yet the integer part of the largest component by magnitude will fit into a 32-bit integer; convert the X, Y and Z components of the result to integers and hash them together. Use that as a hash function for a map or hashtable (NOT as an array index, you need to deal with collisions).

You may also want to consider using a fudge factor when comparing the coordinates, since you may get floating point values which are only slightly different, and it is usually preferable to weld those together to avoid cracks when rendering.

moonshadow
+1  A: 

Use any unique transformation of your 3D coordinates and store only the list of the results.

Example: md5('X, Y, Z') is unique and you can store only the resulting string.

The hash is not a performant idea but you get the concept. Find any methematic unique transformation and you have it.

/Vey

Veynom
A: 

If you write a helper class with a simple public interface, that greatly reduces the practical tedium of implementation details like use of a map<map<map<>>>. The beauty of encapsulation!

That said, you might be able to rig a hashmap to do the trick nicely. Just hash the three doubles together to get the key for the point as a whole. If you're concerned about to many collisions between points with symmetric coordinates (e.g., (1, 2, 3) and (3, 2, 1) and so on), just make the hash key asymmetric with respect to the x, y, and z coordinates, using bit shift or some such.

zweiterlinde
+5  A: 

Probably the simplest way to speed up such processing is to store the already-processed points in Octree. Checking for duplication will become close to logarithmic.

Also, make sure you tolerate round-off errors by checking the distance between the points, not the equality of the coordinates.

Vytautas Shaltenis
+1  A: 

You could use a hash_set of any hashable type - for example, turn each tuple into a string "(x, y, z)". hash_set does fast lookups but handles collisions well.

Tyler
A: 

Whatever your storage method, I would suggest you decide on an epsilon (minimum floating point distance that differentiates two coordinates), then divide all coordinates by the epsilon, round and store them as integers.

ΤΖΩΤΖΙΟΥ
A: 

Something in this direction maybe:

struct Coor {
    Coor(double x, double y, double z)
    : X(x), Y(y), Z(z) {}
    double X, Y, Z;
}

struct coords_thesame
{
  bool operator()(const Coor& c1, const Coor& c2) const {
    return c1.X == c2.X && c1.Y == c2.Y && c1.Z == c2.Z;
  }
};

std::hash_map<Coor, bool, hash<Coor>, coords_thesame> m_SeenCoordinates;

Untested, use at your own peril :)

Roel
Doubles aren't compared that way in real life...
Vulcan Eager
Well it depends on where the coordinates come from, if the rounding is done at the point where the coordinates are made, you could use == on doubles. If they are calculated 'free form', then yes, you'd have to compare against an epsilon.
Roel
A: 

Assuming you already have a Coordinate class, add a hash function and maintain a hash_set of the coordinates. Would look something like:

struct coord_eq
{
  bool operator()(const Coordinate &s1, const Coordinate &s2) const
  {
    return s1 == s2;
    // or: return s1.x() == s2.x() && s1.y() == s2.y() && s1.z() == s2.z();
  }
};

struct coord_hash
{
  size_t operator()(const Coordinate &s) const
  {
     union {double d, unsigned long ul} c[3];
     c[0].d = s.x();
     c[1].d = s.y();
     c[2].d = s.z();
     return static_cast<size_t> ((3 * c[0].ul) ^ (5 * c[1].ul) ^ (7 * c[2].ul));
  }
};

std::hash_map<Coordinate, coord_hash, coord_eq> existing_coords;
finnw
+2  A: 

Divide your space into discrete bins. Could be infinitely deep squares, or could be cubes. Store your processed coordinates in a simple linked list, sorted if you like in each bin. When you get a new coordinate, jump to the enclosing bin, and walk the list looking for the new point.

Be wary of floating point comparisons. You need to either turn values into integers (say multiply by 1000 and truncate), or decide how close 2 values are to be considered equal.

Ed
A: 

You can easily define a comparator for a one-level std::map, so that lookup becomes way less cumbersome. There is no reason of being afraid of that. The comparator defines an ordering of the _Key template argument of the map. It can then also be used for the multimap and set collections.

An example:

#include <map>
#include <cassert>


struct Point {
  double x, y, z;
};

struct PointResult {
};

PointResult point_function( const Point& p ) { return PointResult(); }

// helper: binary function for comparison of two points
struct  point_compare {
  bool operator()( const Point& p1, const Point& p2 ) const {
    return p1.x < p2.x
      || ( p1.x == p2.x && ( p1.y < p2.y 
      || ( p1.y == p2.y && p1.z < p2.z ) 
      )
      );
  }
};

typedef std::map<Point, PointResult, point_compare> pointmap;

int _tmain(int argc, _TCHAR* argv[])
{

pointmap pm;

Point p1 = { 0.0, 0.0, 0.0 };
Point p2 = { 0.1, 1.0, 1.0 };

pm[ p1 ] = point_function( p1 );
pm[ p2 ] = point_function( p2 );
assert( pm.find( p2 ) != pm.end() );
    return 0;
}
xtofl
+1  A: 

Use an std::set. Define a type for the 3d coordinate (or use a boost::tuple) that has operator< defined. When adding elements, you can add it to the set, and if it was added, do your processing. If it was not added (because it already exists in there), do not do your processing.

However, if you are using doubles, be aware that your algorithm can potentially lead to unpredictable behavior. IE, is (1.0, 1.0, 1.0) the same as (1.0, 1.0, 1.000000001)?

A: 

There are more than a few ways to do it, but you have to ask yourself first what are your assumptions and conditions.

So, assuming that your space is limited in size and you know what is the maximum accuracy, then you can form a function that given (x,y,z) will convert them to a unique number or string -this can be done only if you know that your accuracy is limited (for example - no two entities can occupy the same cubic centimeter). Encoding the coordinate allows you to use a single map/hash with O(1).

If this is not tha case, you can always use 3 embedded maps as you suggested, or go towards space division algorithms (such as OcTree as mentioned) which although given O(logN) on a average search, they also give you additional information you might want (neighbors, population, etc..), but of course it is harder to implement.

Adi
+3  A: 

You can easily use a set as follows:

#include <set>
#include <cassert>

const double epsilon(1e-8);

class Coordinate {
public:
Coordinate(double x, double y, double z) :
  x_(x), y_(y), z_(z) {}

private:
double x_;
double y_;
double z_;

friend bool operator<(const Coordinate& cl, const Coordinate& cr);
};

bool operator<(const Coordinate& cl, const Coordinate& cr) {
  if (cl.x_ < cr.x_ - epsilon) return true;
  if (cl.x_ > cr.x_ + epsilon) return false;

  if (cl.y_ < cr.y_ - epsilon) return true;
  if (cl.y_ > cr.y_ + epsilon) return false;

  if (cl.z_ < cr.z_ - epsilon) return true;

  return false;

}

typedef std::set<Coordinate> Coordinates;

// Not thread safe!
// Return true if real processing is done
bool Process(const Coordinate& coordinate) {
  static Coordinates usedCoordinates;

  // Already processed?
  if (usedCoordinates.find(coordinate) != usedCoordinates.end()) {
    return false;
  }

  usedCoordinates.insert(coordinate);

  // Here goes your processing code



  return true;

}

// Test it
int main() {
  assert(Process(Coordinate(1, 2, 3)));
  assert(Process(Coordinate(1, 3, 3)));
  assert(!Process(Coordinate(1, 3, 3)));
  assert(!Process(Coordinate(1+epsilon/2, 2, 3)));
}
A: 

You can either use a std::set of 3D coordinates, or a sorted std::vector. Both will give you logarithmic time lookup. In either case, you'll need to implement the less than comparison operator for your 3D coordinate class.

A: 

Why bother? What "processing" are you doing? Unless it's very complex, it's probably faster to just do the calculation again, rather then waste time looking things up in a huge map or hashtable.

This is one of the more counter-intuitive things about modern cpu's. Computation is fast, memory is slow.

I realize this isn't really an answer to your question, it's questioning your question.

hyperlogic
but (early optimization == root(evil)) != (Performance doesen't matter)
sum1stolemyname
A: 

Good question... it's one that has many solutions, because this type of problem comes up many times in Graphical and Scientific applications.

Depending on the solution you require it may be rather complex under the hood, in this case less code doesn't necessarily mean faster.

"but this makes it quite tedious to use" --- generally, you can get around this by typedefs or wrapper classes (wrappers in this case would be highly recommended).

If you don't need to use the 3D co-ordinates in any kind of spacially significant way ( things like "give me all the points within X distance of point P") then I suggest you just find a way to hash each point, and use a single hash map... O(n) creation, O(1) access (checking to see if it's been processed), you can't do much better than that.

If you do need more spacial information you'll need a container that explicitly takes it into account. The type of container you choose will be dependant on your data set. If you have good knowledge of the range of values that you recieve this will help.

If you are recieving well distributed data over a known range... go with octree.

If you have a distribution that tends to cluster, then go with k-d trees. You'll need to rebuild a k-d tree after inputting new co-ordinates (not necessarily every time, just when it becomes overly imbalanced). Put simply, Kd-trees are like Octrees, but with non uniform division.