views:

1533

answers:

6

I'm working on an iPhone App that relies heavily on OpenGL. Right now it runs a bit slow on the iPhone 3G, but looks snappy on the new 32G iPod Touch. I assume this is hardware related. Anyway, I want to get the iPhone performance to resemble the iPod Touch performance. I believe I'm doing a lot of things sub-optimally in OpenGL and I'd like advice on what improvements will give me the most bang for the buck.

My scene rendering goes something like this:

  • Repeat 35 times
    • glPushMatrix
    • glLoadIdentity
    • glTranslate
    • Repeat 7 times
      • glBindTexture
      • glVertexPointer
      • glNormalPointer
      • glTexCoordPointer
      • glDrawArrays(GL_TRIANGLES, ...)
    • glPopMatrix

My Vertex, Normal and Texture Coords are already interleaved.

So, what steps should I take to speed this up? What step would you try first?

My first thought is to eliminate all those glBindTexture() calls by using a Texture Atlas.

What about some more efficient matrix operations? I understand the gl*() versions aren't too efficient.

What about VBOs?

Update

There are 8260 triangles. Texture sizes are 64x64 pngs. There are 58 different textures.

I have not run instruments.

Update 2

After running the OpenGL ES Instrument on the iPhone 3G I found that my Tiler Utilization is in the 90-100% range, and my Render Utilization is in the 30% range.

Update 3

Texture Atlasing had no noticeable affect on the problem. Utilization ranges are still as noted above.

Update 4

Converting my Vertex and Normal pointers to GL_SHORT seemed to improve FPS, but the Tiler Utilization is still in the 90% range a lot of the time. I'm still using GL_FLOAT for my texture coordinates. I suppose I could knock those down to GL_SHORT and save four more bytes per vertex.

Update 5

Converting my texture coordinates to GL_SHORT yielded another performance increase. I'm now consistently getting >30 FPS. Tiler Utilization is still around 90%, but frequently drops down in the the 70-80% range. The Renderer Utilization is hovering around 50%. I suppose this might have something to do with scaling the texture coordinates from GL_TEXTURE Matrix Mode.

I'm still seeking additional improvements. I'd like to get closer to 40 FPS, as that's what my iPod Touch gets and it's silky smooth there. If anyone is still paying attention, what other low-hanging fruit can I pick?

+2  A: 

The first thing I would do is run Instruments profiling on the hardware device that is slow. It should show you pretty quickly where the bottlenecks are for your particular case.

Update after instruments results:

This question has a similar result in Instruments to you, perhaps the advice is also applicable in your case (basically reducing number vertex data)

justinlatimer
Yes, I saw that after messing around with Instruments and trying to understand what it was telling me. I will try the relevant advice, once I figure out which advice is relevant.
Rob Jones
That discussion is pretty heavy duty :) At least you've got a road to go down.
justinlatimer
Yeah, I thought so too. I'm using GL_FLOAT. I'm going to try moving some of the vertex attributes to GL_SHORT or GL_BYTE.
Rob Jones
I'm the one who asked that question, and the relevant tuning that made the most difference was going from GL_FLOAT to GL_SHORT. If done right, you don't lose resolution for your 3-D models, and I saw a 30% improvement in rendering performance from that alone.
Brad Larson
@Brad, I attempted going to GL_SHORT, but the objects were not visible when I rendered the view. As noted above, I'm doing glTranslates() on all three axis. I saw a comment elsewhere that suggested this could cause a problem, although the commenter did not specify the problem. Perhaps I will post this all as a new question.
Rob Jones
@Brad, never mind. I figured all that junk out.
Rob Jones
+3  A: 

If you only have 58 different 64x64 textures, a texture atlas seems like a good idea, since they'd all fit in a single 512x512 texture... if you don't rely on texture wrap modes, I'd certainly at least try this.

What format are your textures in? You might try using a compressed PVRTC texture; I think that's less load on the Tiler, and I've been pleasantly surprised by the image quality even for 2-bit-per-pixel textures. (Good for natural images, not good if you're doing something that looks like an 8-bit video game)

David Maymudes
They're pngs. I don't rely on texture wrap modes.
Rob Jones
@Rob Jones: he meant what format are they stored in VRAM not disk. `GL_RGB`, `GL_RGBA`, `GL_LUMINANE_ALPHA`, etc.
caspin
+1  A: 

Have you looked over the "OpenGL ES Programming Guide for iPhone OS" in the dev center? There are sections on Best Practices for Vertex Data and Texture Data.

Is your data formatted to be able to use triangle strips?

In terms of least effort, the modification sequence for you would probably be:

  • Reducing vertex attribute size
  • VBOs

Note that when you do these, you need to make sure that components are aligned on their native alignment, i.e. the floats or full ints are on 4-byte boundaries, the shorts are on 2-byte boundaries. If you don't do this it will tank your performance. It might be helpful to mentally map it by typing out your attribute ordering as a struct definition so you can sanity check your layout and alignment.

  • making sure your data is stripped to share vertices
  • using a texture atlas to reduce texture swaps
nctrost
Yes, I've read the document.
Rob Jones
+1  A: 

The biggest win in graphics programming comes down to this:

Batch, Batch, Batch

TextureAtlasing will make a bigger difference than most anything else you can do. Switching textures is like stopping a speeding train to let on new passengers every time.

Combine all those textures into an atlas and cut your draw calls down a lot.

This web-based tool may be helpful: http://zwoptex.zwopple.com/

David Whatley
+1  A: 

To try converting your textures to 16-bit RGB565 format, see this code in Apple's venerable Texture2D.m, search for kTexture2DPixelFormat_RGB565

http://code.google.com/p/cocos2d-iphone/source/browse/branches/branch-0.1/OpenGLSupport/Texture2D.m

(this code loads PNGs and converts them to RGB565 at texture creation time; I don't know if there's an RGB565 file format as such)

For more information on PVRTC compressed textures (which looked way better than I expected when I used them, even at 2 bits per pixel) see Apple's PVRTextureLoader sample:

http://developer.apple.com/iPhone/library/samplecode/PVRTextureLoader/index.html

it has both the code for loading PVRTC textures in your app and also instructions for using the texturetool to convert your .png files into .pvr files.

David Maymudes
This looks useful. Thanks!
Rob Jones
It doesn't look like this has affected my FPS, but it knocked the memory footprint down half a meg. Thanks for the tips. I may yet look at PVRTC.
Rob Jones
+5  A: 

With a tiler utilization still above 90%, you’re likely still vertex throughput-bound. Your renderer utilization is higher because the GPU is rendering more frames. If your primary focus is improving performance on older devices, then the key is still to cut down on the amount of vertex data needed per triangle. There are two sides to this:

Reducing the amount of data per vertex: Now that all of your vertex attributes are already GL_SHORTs, the next thing to pursue is finding a way to do what you want using fewer attributes or components. For example, if you can live without specular highlights, using DOT3 lighting instead of OpenGL ES fixed-function lighting would replace your 3 shorts (+ 1 short of padding) for normals with 2 shorts for an extra texture coordinate. As an additional bonus, you’d be able to light your models per-pixel.

Reducing the number of vertices needed per triangle: When drawing with indexed triangles, you should make sure that your indices are sorted for maximum reuse. Running your geometry through Imagination Technologies’ PVRTTriStrip tool would probably be your best bet here.

Pivot
What is DOT3 lighting. I'm having trouble finding a decent definition.
Rob Jones
I've decided to mark this as the best answer because it gets to the core of the problem: the size of the data being pushed down to OpenGL.
Rob Jones