views:

1382

answers:

8

I'm writing a comparatively straightforward raytracer/path tracer in D (http://dsource.org/projects/stacy), but even with full optimization it still needs several thousand processor cycles per ray. Is there anything else I can do to speed it up? More generally, do you know of good optimizations / faster approaches for ray tracing?

Edit: this is what I'm already doing.

  • Code is already running highly parallel
  • temporary data is structured in a cache-efficient fashion as well as aligned to 16b
  • Screen divided into 32x32-tiles
  • Destination array is arranged in such a way that all subsequent pixels in a tile are sequential in memory
  • Basic scene graph optimizations are performed
    • Common combinations of objects (plane-plane CSG as in boxes) are replaced with preoptimized objects
  • Vector struct capable of taking advantage of GDC's automatic vectorization support
  • Subsequent hits on a ray are found via lazy evaluation; this prevents needless calculations for CSG
  • Triangles neither supported nor priority. Plain primitives only, as well as CSG operations and basic material properties
  • Bounding is supported
+5  A: 

The typical first order improvement of raytracer speed is some sort of spatial partitioning scheme. Based only on your project outline page, it seems you haven't done this.

Probably the most usual approach is an octree, but the best approach may well be a combination of methods (e.g. spatial partitioning trees and things like mailboxing). Bounding box/sphere tests are a quick cheap and nasty approach, but you should note two things: 1) they don't help much in many situations and 2) if your objects are already simple primitives, you aren't going to gain much (and might even lose). You can more easily (than octree) implement a regular grid for spatial partitioning, but it will only work really well for scenes that are somewhat uniformly distributed (in terms of surface locations)

A lot depends on the complexity of the objects you represent, your internal design (i.e. do you allow local transforms, referenced copies of objects, implicit surfaces, etc), as well as how accurate you're trying to be. If you are writing a global illumination algorithm with implicit surfaces the tradeoffs may be a bit different than if you are writing a basic raytracer for mesh objects or whatever. I haven't looked at your design in detail so I'm not sure what, if any, of the above you've already thought about.

Like any performance optimization process, you're going to have to measure first to find where you're actually spending the time, then improving things (algorithmically by preference, then code bumming by necessity)

simon
+2  A: 

I don't know D at all, so I'm not able to look at the code and find specific optimizations, but I can speak generally.

It really depends on your requirements. One of the simplest optimizations is just to reduce the number of reflections/refractions that any particular ray can follow, but then you start to lose out on the "perfect result".

Raytracing is also an "embarrassingly parallel" problem, so if you have the resources (such as a multi-core processor), you could look into calculating multiple pixels in parallel.

Beyond that, you'll probably just have to profile and figure out what exactly is taking so long, then try to optimize that. Is it the intersection detection? Then work on optimizing the code for that, and so on.

Chad Birch
It's already highly parallel :)The problem is, preliminary profiling seems to indicate that the ray/sphere and ray/plane collision tests take the longest and, well, I'm not sure what to do about them. I'm already precalculating everything I can.
FeepingCreature
+3  A: 

Is there anything else I can do to speed it up?

D, depending on the implementation and compiler, puts forth reasonably good performance. As you haven't explained what ray tracing methods and optimizations you're using already, then I can't give you much help there.

The next step, then, is to run a timing analysis on the program, and recode the most frequently used code or slowest code than impacts performance the most in assembly.

More generally, check out the resources in these questions:

I really like the idea of using a graphics card (a massively parallel computer) to do some of the work.

There are many other raytracing related resources on this site, some of which are listed in the sidebar of this question, most of which can be found in the raytracing tag.

Adam Davis
I'm trying to stay away from assembly :)Ray hits are lazily evaluated, and basic scene graph optimizations are performed (merging translation into the rotation matrix, stuff like that). Timing analysis indicates that the ray/sphere and ray/plane code are taking most of the time, but I honestly don't think I could do a better job than GDC at full optimization.
FeepingCreature
+1  A: 

Some suggestions.

  • Use bounding objects to fail fast.
  • Project the scene at a first step (as common graphic cards do) and use raytracing only for light calculations.
  • Parallelize the code.
Daniel Brückner
A: 

My first question is - are you trying to optimize the tracing of one single still screen, or is this about optimizing the tracing of multiple screens in order to calculate an animation ?

Optimizing for a single shot is one thing, if you want to calculate successive frames in an animation there are lots of new things to think about / optimize.

Led
Very much single-shot.
FeepingCreature
+1  A: 

One thing I learned with my ray tracer is that a lot of the old rules don't apply anymore. For example, many ray tracing algorithms do a lot of testing to get an "early out" of a computationally expensive calculation. In some cases, I found it was much better to eliminate the extra tests and always run the calculation to completion. Arithmetic is fast on a modern machine, but a missed branch prediction is expensive. I got something like a 30% speed-up on my ray-polygon intersection test by rewriting it with minimal conditional branches.

Sometimes the best approach is counter-intuitive. For example, I found that many scenes with a few large objects ran much faster when I broke them down into a large number of smaller objects. Depending on the scene geometry, this can allow your spatial subdivision algorithm to throw out a lot of intersection tests. And let's face it, intersection tests can be made only so fast. You have to eliminate them to get a significant speed-up.

Hierarchical bounding volumes help a lot, but I finally grokked the BSP tree, and got a HUGE increase in speed. Of course, building the tree has a cost that may make it prohibitive for real-time animation.

Watch for synchronization bottlenecks.

You've got to profile to be sure to focus your attention in the right place.

Adrian McCarthy
counter-intuitive is the way of many things these days. what i knew about optimization ten years ago is all out the window. branch prediction is a very important thing to understand.
DarenW
A: 

You have first to make sure that you use very fast algorithms (implementing them can be a real pain, but what do you want to do and how far want you to go and how fast should it be, that's a kind of a tradeof).

some more hints from me - don't use mailboxing techniques, in papers it is sometimes discussed that they don't scale that well with the actual architectures because of the counting overhead - don't use BSP/Octtrees, they are relative slow. - don't use the GPU for Raytracing, it is far too slow for advanced effects like reflection and shadows and refraction and photon-mapping and so on ( i use it only for shading, but this is my beer)

For a complete static scene kd-Trees are unbeatable and for dynamic scenes there are clever algorithms there that scale very well on a quadcore (i am not sure about the performance above).

And of course, for a realy good performance you need to use very much SSE code (with of course not too much jumps) but for not "that good" performance (im talking here about 10-15% maybe) compiler-intrinsics are enougth to implement your SSE stuff.

And some decent Papers about some Algorithms i was talking about:

"Fast Ray/Axis-Aligned Bounding Box - Overlap Tests using Ray Slopes" ( very fast very good paralelisizable (SSE) AABB-Ray hit test )( note, the code in the paper is not all code, just google for the title of the paper, youll find it)

http://graphics.tu-bs.de/publications/Eisemann07RS.pdf

"Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies"

http://www.sci.utah.edu/~wald/Publications/2007///BVH/download//togbvh.pdf

if you know how the above algorithm works then this is a much greater algorithm:

"The Use of Precomputed Triangle Clusters for Accelerated Ray Tracing in Dynamic Scenes"

http://garanzha.com/Documents/UPTC-ART-DS-8-600dpi.pdf

I'm also using the pluecker-test to determine fast (not thaat accurate, but well, you can't have all) if i hit a polygon, works very pretty with SSE and above.

So my conclusion is that there are so many great papers out there about so much Topics that do relate to raytracing (How to build fast, efficient trees and how to shade (BRDF models) and so on and so on), it is an realy amazing and interesting field of "experimentating", but you need to have also much sparetime because it is so damn complicated but funny.

Quonux
A: 

Raytrace every other pixel. Get the color in between by interpolation. If the colors vary greatly (you are on an edge of an object), raytrace the pixel in between. It is cheating, but on simple scenes it can almost double the performance while you sacrifice some image quality.

Render the scene on GPU, then load it back. This will give you the first ray/scene hit at GPU speeds. If you do not have many reflective surfaces in the scene, this would reduce most of your work to plain old rendering. Rendering CSG on GPU is unfortunately not completely straightforward.

Read source code of PovRay for inspiration. :)

Roman Zenka