views:

262

answers:

1

Hello there, time for another XNA question. This time it is purely from a technical design standpoint though.

My situation is this: I've created a particle-engine based on GPU-calculations, far from complete but it works. My GPU easily handles 10k particles without breaking a sweat and I wouldn't be surprised if I could add a bunch more.

My problem: Whenever I have a lot of particles created at the same time, my frame rate hates me. Why? A lot of CPU-usage, even though I have minimized it to contain almost only memory operations.

Creation of particles is still done by CPU-calls such as:

  • Method wants to create particle and makes a call.
  • Quad is created in form of vertices and stored in a buffer
  • Buffer is inserted into GPU and my CPU can focus on other things

When I have about 4 emitters creating one particle per frame, my FPS lowers (sure, only 4 frames per seconds but 15 emitters drops my FPS to 25).

Creation of a particle:

        //### As you can see, not a lot of action here. ###
        ParticleVertex []tmpVertices = ParticleQuad.Vertices(Position,Velocity,this.TimeAlive);
        particleVertices[i] = tmpVertices[0];
        particleVertices[i + 1] = tmpVertices[1];
        particleVertices[i + 2] = tmpVertices[2];
        particleVertices[i + 3] = tmpVertices[3];
        particleVertices[i + 4] = tmpVertices[4];
        particleVertices[i + 5] = tmpVertices[5];

        particleVertexBuffer.SetData(particleVertices);

My thoughts are that maybe I shouldn't create particles that often, maybe there is a way to let the GPU create everything, or maybe I just don't know how you do these stuff. ;)

Edit: If I weren't to create particles that often, what is the workaround for still making it look good?

So I am posting here in hope that you know how a good particle-engine should be designed and if maybe I took the wrong route somewhere.

+2  A: 

There is no way to have the GPU create everything (short of using Geometry Shaders which requires SM4.0).

If I were creating a particle system for maximum CPU efficiency, I would pre-create (just to pick a number for sake of example) 100 particles in a vertex and index buffer like this:

  • Make a vertex buffer containing quads (four vertices per particle, not six as you have)
  • Use a custom vertex format which can store a "time offset" value, as well as a "initial velocity" value (similar to the XNA Particle 3D Sample)
  • Set the time value such that each particle has a time offset of 1/100th less than the last one (so offsets range from 1.0 to 0.01 through the buffer).
  • Set the initial velocity randomly.
  • Use an index buffer that gives you the two triangles you need using the four vertices for each particle.

And the cool thing is that you only need to do this once - you can reuse the same vertex buffer and index buffer for all your particle systems (providing they are big enough for your largest particle system).

Then I would have a vertex shader that would take the following input:

  • Per-Vertex:
    • Time offset
    • Initial velocity
  • Shader Parameters:
    • Current time
    • Particle lifetime (which is also the particle time wrap-around value, and the fraction of particles in the buffer being used)
    • Particle system position/rotation/scale (the world matrix)
    • Any other interesting inputs you like, such as: particle size, gravity, wind, etc
    • A time scale (to get a real time, so velocity and other physics calculations make sense)

That vertex shader (again like the XNA Particle 3D Sample) could then determine the position of a particle's vertex based on its initial velocity and the time that that particle had been in the simulation.

The time for each particle would be (pseudo code):

time = (currentTime + timeOffset) % particleLifetime;

In other words, as time advances, particles will be released at a constant rate (due to the offset). And whenever a particle dies at time = particleLifetime (or is it at 1.0? floating-point modulus is confusing), time loops back around to time = 0.0 so that the particle re-enters the animation.

Then, when it came time to draw my particles, I would have my buffers, shader and shader parameters set, and call DrawIndexedPrimitives. Now here's the clever bit: I would set startIndex and primitiveCount such that no particle starts out mid-animation. When the particle system first starts I'd draw 1 particle (2 primitives), and by the time that particle is about to die, I'd be drawing all 100 particles, the 100th of which would just be starting.

Then, a moment later, the 1st particle's timer would loop around and make it the 101st particle.

(If I only wanted 50 particles in my system, I'd just set my particle lifetime to 0.5 and only ever draw the first 50 of the 100 particles in the vertex/index buffer.)

And when it came time to turn off the particle system - simply do the same in reverse - set the startIndex and primitiveCount such that particles stop being drawn after they die.

Now I must admit that I've glossed over the maths involved and some details about using quads for particles - but it should not be too hard to figure out. The basic principle to understand is that you're treating your vertex/index buffer as a circular buffer of particles.

One downside of a circular buffer is that, when you stop emitting particles, unless you stop when the current time is a multiple of the particle lifetime, you will end up with the active set of particles straddling the ends of the buffer with a gap in the middle - thus requiring two draw calls (a bit slower). To avoid this you could wait until the time is right before stopping - for most systems this should be ok, but might look weird for some (eg: a "slow" particle system that needs to stop instantly).

Another downside to this method is that particles must be released at a constant rate - although that is usually pretty typical for particle systems (obviously this is per-system and the rate is adjustable). With a little tweaking an explosion effect (all particles released at once) should be possible.

All that being said: If possible, it may be worthwhile using an existing particle library.

Andrew Russell