views:

150

answers:

3

Hi there!

I am working on ipad app that renders hundreds of sprites (2d images) each frame. I am using modified drawing method taken from sdk's opengl template but problem is, I got only 3fps and I am not even rendering everything I need.

I tried simple optimization methods like using texture atlases, minimizing number of state changes, high level clipping and I even render sprites sorted by common attributes, like color or texture used but it didn't seem to help much. I can't use pvr compression because my images have fine edges and alpha channel that looks terrible when compressed (I use it only on few background images).

I am now trying to use VBOs but I am not sure if they are good for simple sprites (2 triangles). I always thought that they are good for models with larger amount of vertices. I am not even sure how to correctly implement them. I will probably need to save VBO index into my sprite class. Problem is I don't always use class to render sprite, sometimes I just calculate position, size and UV of sprite on the fly (e.g. text rendering). Any ideas if using VBOs with sprite rendering will give some performance boost?

here is my render function:

- (void)RenderTexture:(GLTexture*)tex InRect:(CGRect)dest WithUV:(CGRect)uv Color:(LSColor*)color Effect:(SpriteEffect)effect Rotation:(float)rot AroundPoint:(CGPoint)rotCenter {
 if(tex.ID != mLastBoundTexture) {
   [tex bind];
   mLastBoundTexture = tex.ID;
 }

 mSquareVertices[2] = mSquareVertices[6] = dest.size.width;
 mSquareVertices[5] = mSquareVertices[7] = dest.size.height;

 mSquareUVs[0] = mSquareUVs[4] = uv.origin.x;
 mSquareUVs[1] = mSquareUVs[3] = uv.origin.y;
 mSquareUVs[2] = mSquareUVs[6] = uv.origin.x + uv.size.width;
 mSquareUVs[5] = mSquareUVs[7] = uv.origin.y + uv.size.height;

 mSquareColors[0] = mSquareColors[4] = mSquareColors[8] = mSquareColors[12] = color.red;
 mSquareColors[1] = mSquareColors[5] = mSquareColors[9] = mSquareColors[13] = color.green;
 mSquareColors[2] = mSquareColors[6] = mSquareColors[10] = mSquareColors[14] = color.blue;
 mSquareColors[3] = mSquareColors[7] = mSquareColors[11] = mSquareColors[15] = color.alpha;

 mat4f_LoadTranslation2f(rotCenter.x, rotCenter.y, mModelViewMatrix);

 mat4f_LoadTranslation2f(dest.origin.x, dest.origin.y, mModelViewMatrix);
 mat4f_MultiplyMat4f(mProjectionMatrix, mModelViewMatrix, mModelViewProjMatrix);

 if(mLastUsedShader != effect) {
   int program;
   if(effect == SENormal) {
     glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
     program = mShaderNormal;
   }
   else if(effect == SEMultiply) {
     glBlendFunc(GL_DST_COLOR, GL_ZERO);
     program = mShaderMultiply;
   }  
 else {
   NSLog(@"Implement SpriteEffect %i", effect);
 }

 glUseProgram(program);
 mLastUsedShader = effect;
 } 
 glUniformMatrix4fv(uniforms[UNIFORM_MODELVIEW_PROJECTION_MATRIX], 1, GL_FALSE, mModelViewProjMatrix);

 // Update attribute values
 glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, mSquareVertices);
 glVertexAttribPointer(ATTRIB_UV, 2, GL_FLOAT, 0, 0, mSquareUVs);
 glVertexAttribPointer(ATTRIB_COLOR, 4, GL_UNSIGNED_BYTE, 1, 0, mSquareColors);

 glUniform4fv(uniforms[UNIFORM_POSTPROCES_PARAMS], 4, mPostprocessParams);

 // Draw
 glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
 }

I think that color could be another area of improvement, because it doesn't change very often (few times per frame) but I don't how to set it for longer than just current render call.

Do you see any other areas where I can improve my framerate? I really need to get this at 30fps at least

EDIT: turns out I had way too complicated fragment shader. I feel stupid for not disabling it to test it. Looks like I will have to say goodbye to my desaturation feature. With default fragment shader I can easily get over 60 fps.

A: 

You might want to look into point sprites to speedup rendering.

Matias Valdenegro
My sprites use shaders and are often rectangular, not squares. I checked point sprites and they seem to support only basic functionality. Did I miss something?
Lope
+1  A: 

My gut feeling is that you're simply fill-rate bound.

How many pixels do your 100 sprites cover ? The GPU has a limited capacity of computing pixels (especially with blending on - you have alpha you said- since it requires reading and writing the framebuffer). And if you generate too many of them, your frame rate will suffer dramatically. the worst case for you would be that each sprite covers your whole screen, incurring ~100x the total pixel count of your screen. (that 100x is what we call the overdraw factor).

Another alternative is that you're shader bound. What does your fragment shader do ? What happens if you replace it with a simple constant color output ?

I don't think that the geometry submission has anything to do with your perf issues (not for 100 sprites).

Bottom line is, to look at performance, you want to use performance analysis tools. I don't code myself against the ipad. Does the SDK provide any tool to analyze perf ?

Bahbar
I said hundreds, not hundred :) there is really a lot of them, almost 2000 at peak. But they are pretty small, 64x64. There is a lot of overdraw (but shouldn't be more than 10). My fragment and vertex shaders are really simple, using only constant color output helps a bit but it is nowhere near where I need it to be.
Lope
There are performance tools, but I always have hard time using them. I don't know how to interpret them properly even after some googling, I will have to look at that more. Bigger problem is that 75% of attempts fail for some reason and Instruments just freeze, or stop working or some other random thing that prevents me from using them properly
Lope
@Lope: Well, to verify, you can make them way smaller. 10x overdraw is big. But 2000 is starting to make it significantly more costly on the cpu too... You should try to batch them at that point.
Bahbar
yeah, I planned to batch them, but I have no idea how to do it. Everywhere I looked was only said that I should batch them, but I wasn't able to find how exactly to do it in es2.0. Any pointers? I will try to make them smaller and I will post results.
Lope
When I scale down everything to 10% of original size, I get 10 fps (from 3 at original size), which is still way too low, overdraw is now below 1 (there is lot of empty space)
Lope
@Lope: So you're fill-rate bound, and once getting them significantly smaller, you start to be something-else bound. Bottom line is, you're trying to make the ipad do too much for your target fps.
Bahbar
ok, thanks a lot for your help so far, I will try to change my game so that it doesn't use so many sprites and will play with it a little more and I will see what can get out of it.
Lope
Turns out I had too complicated shader. But your answer helped me a lot to understand little more about opengl so I guess I should mark it as accepted, thanks again :)
Lope
+1  A: 

Hi,

you are not fill-rate bound (well you might be but there's a much bigger problem). You said you have 2000 sprites. For each you set the vertex/fragment shader seperately, you calculate the projection and other matrices for EACH sprite and you only render a single sprite with each render call. That way you will never be able to render a decent amount of sprites no matter whether you actually use textures/complex shaders or just plain flat shading.

What you have to do: batching. Batching means that you must try to accumulate as many sprites into a single vertex buffer object and then draw as many of them as possible with a single call to glDrawElements/glDrawArrays. There's a couple of things that might keep you from batching sprites: they use different textures (they shouldn't use a texture atlas), they use different shaders (unlikely for 2000 sprites unless you do something really really weired) and so on. These can be solved to some extend. Sort by z-order, then by material, where material is texture/shader. Then you can send bigger groups of sprites over to the GPU in one call.

The last thing i should mention: you will have to do the transformations on the CPU yourself instead of setting a new Matrix for each sprite and let the CPU do the work.

For an example of how such a sprite batch might look you can check-out my SpriteBatch class i wrote for an Android game dev lib. It's not 100% optimal but pretty close and works for both GL ES 1.x and 2.0 (uses a static shader in the later case for now though). In there you can find how to easily transform the vertices of your sprites yourself without matrices. You can find the code at http://code.google.com/p/libgdx/source/browse/trunk/gdx/src/com/badlogic/gdx/graphics/SpriteBatch.java

hth, Mario

Mario Zechner
Hey! thanks for your answer. Actually, I don't change shader every time I render something, I change it only if necessary, same goes for texture. I do however calculate modelview matrix for each sprite, I thought that is right way to do it. This matrix was also main problem why I didn't know how to batch, I didn't realized I should calculate position on CPU. Thanks for your help, I will try to implement it and see if it helps to increase framerate even more.
Lope