views:

413

answers:

4

Hello stackoverflow,

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.

I did some very unsientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:

  • Replace floats with uint32's in a fixed point 16.16 configuration
  • Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
  • Use OpenGL Accumulation Buffer instead of float array
  • Use OpenGL floating-point FBO's instead of float array
  • Use OpenGL pixel/vertex shaders

Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?

Thanks in advance D

+1  A: 

Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).

Aaron Digulla
thanks, but the performance issue isn't happening in the sprite render (since i'm just drawing single pixels anyway).
damian
As I said: Stop drawing this yourself and replace each particle with several semi-transparent sprites. The more sprites you draw at one place, the more "glow" you should get.
Aaron Digulla
i didn't make this clear in the original question but i'm really interested in leaving nice trails, and the glow is secondary. nice trails need float images so you can fade them out smoothly over a long time.
damian
Have you tried to reduce the alpha value over time? That should give you a fade effect, too.
Aaron Digulla
yep, tried that. 256 steps isn't enough for a nice fade...
damian
+1  A: 

If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

unwind
hanks; i'm not poking pixels on the graphics hardware, rather using additive blending on a floating point array then drawing that as a texture. the biggest problem with using textured polygons directly is subsequently getting access to/modifying the FBO so i can do trails that fade out over time
damian
+4  A: 

The problem is simply the sheer amount of data you have to process.

Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:

  • Clear the buffer
  • Render something on it (uses reads and writes)
  • Convert to unsigned bytes
  • Upload to OpenGL

That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.

If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:

  • Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).

  • Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.

  • Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.

The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.

Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.

Nils Pipenbrinck
thanks for your answer! i didn't make this clear in the original question but i'm really interested in leaving nice trails, which need float images so you can fade them out smoothly over a long time...
damian
You could use half-floats or 16 bit per channel integers then...
Nils Pipenbrinck
+2  A: 

It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.

Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.

Crashworks