I'm trying to implement an octree, and for that, I need a fast AABB-ray intersection algorithm. After some searching, I came across this paper that seemed to offer that. From the source code, available here, I translated the pluecker_cls_cff
function to C# as this:
public bool Intersect_2(ref RayPluecker r)
{
switch (r.Classification)
{
// 7 same-ish cases snipped
case Classification.PPP:
return !((r.Position.X > this.Max.X) || (r.Position.Y > this.Max.Y) || (r.Position.Z > this.Max.Z) ||
(r.PlueckerCoefficient.X + r.Direction.X * this.Max.Y - r.Direction.Y * this.Min.X < 0) ||
(r.PlueckerCoefficient.X + r.Direction.X * this.Min.Y - r.Direction.Y * this.Max.X > 0) ||
(r.PlueckerCoefficient.Y + r.Direction.X * this.Min.Z - r.Direction.Z * this.Max.X > 0) ||
(r.PlueckerCoefficient.Y + r.Direction.X * this.Max.Z - r.Direction.Z * this.Min.X < 0) ||
(r.PlueckerCoefficient.Z - r.Direction.Z * this.Min.Y + r.Direction.Y * this.Max.Z < 0) ||
(r.PlueckerCoefficient.Z - r.Direction.Z * this.Max.Y + r.Direction.Y * this.Min.Z > 0));
}
return false;
}
This seems to work fine, but it seemed fairly slow to me (250ms to do 10 million intersects) so I tried some micro-benchmarking with different varieties. In one, I removed the negation that is right after the return
statement and reversed all comparisons (>
to <
and visa versa).
It's now:
case Classification.PPP:
return ((r.Position.X < this.Max.X) || (r.Position.Y < this.Max.Y) || (r.Position.Z < this.Max.Z) ||
(r.PlueckerCoefficient.X + r.Direction.X * this.Max.Y - r.Direction.Y * this.Min.X > 0) ||
(r.PlueckerCoefficient.X + r.Direction.X * this.Min.Y - r.Direction.Y * this.Max.X < 0) ||
(r.PlueckerCoefficient.Y + r.Direction.X * this.Min.Z - r.Direction.Z * this.Max.X < 0) ||
(r.PlueckerCoefficient.Y + r.Direction.X * this.Max.Z - r.Direction.Z * this.Min.X > 0) ||
(r.PlueckerCoefficient.Z - r.Direction.Z * this.Min.Y + r.Direction.Y * this.Max.Z > 0) ||
(r.PlueckerCoefficient.Z - r.Direction.Z * this.Max.Y + r.Direction.Y * this.Min.Z < 0));
This should give the same result, right? It seemed so, as it returns the same results as the negated version with a couple of test cases. However, in the benchmark, it was 5x faster (50ms to do 10 million intersects)! I'm sure it wasn't being optimized out, my benchmark looks like this:
for (int i = 0; i < 10000000; i++)
{
if (!box.Intersect_3(ref ray))
{
throw new Exception();
}
}
What can explain this huge difference? I'm running .NET 4.0 on x86.