I have been trying to perform some OpenGL ES performance optimizations in an attempt to boost the number of triangles per second that I'm able to render in my iPhone application, but I've hit a brick wall. I've tried converting my OpenGL ES data types from fixed to floating point (per Apple's recommendation), interleaving my vertex buffer objects, and minimizing changes in drawing state, but none of these changes have made a difference in rendering speed. No matter what, I can't seem to push my application above 320,000 triangles / s on an iPhone 3G running the 3.0 OS. According to this benchmark, I should be able to hit 687,000 triangles/s on this hardware with the smooth shading I'm using.
In my testing, when I run the OpenGL ES performance tool in Instruments against the running device, I'm seeing the statistic "Tiler Utilization" reaching nearly 100% when rendering my benchmark, yet the "Renderer Utilization" is only getting to about 30%. This may be providing a clue as to what the bottleneck is in the display process, but I don't know what these values mean, and I've not found any documentation on them. Does someone have a good description of what this and the other statistics in the iPhone OpenGL ES instrument stand for? I know that the PowerVR MBX Lite in the iPhone 3G is a tile-based deferred renderer, but I'm not sure what the difference would be between the Renderer and Tiler in that architecture.
If it helps in any way, the (BSD-licensed) source code to this application is available if you want to download and test it yourself. In the current configuration, it starts a little benchmark every time you load a new molecular structure and outputs the triangles / s to the console.