views:

1095

answers:

6

I'm drawing quads in openGL. My question is, is there any additional performance gain from this:

// Method #1

glBegin(GL_QUADS);
// Define vertices for 10 quads
glEnd();

... over doing this for each of the 10 quads:

// Method #2

glBegin(GL_QUADS);
// Define vertices for first quad
glEnd();

glBegin(GL_QUADS);
// Define vertices for second quad
glEnd();

//etc...

All of the quads use the same texture in this case.

+1  A: 

I believe the answer is yes, but you should try it out yourself. Write something to draws 100k quads and see if one is much faster. Then report your results here :)

schnaader: What is meant in the document you read is that you should not have non-gl related code between glBegin and glEnd. They do not mean that you should call it multiple times over calling it in short bits.

Jameson
Thanks for your answer, my results are posted in an answer below :)
Kevin Laity
A: 

I suppose that you get the highest performance gain by reusing the vertices. To achieve that, you would require to maintain some structure for primitives yourself.

pmr
+4  A: 

Yes, the first is faster, because each call to glBegin or glEnd changes the OpenGL state.

Even better, however, than one call to glBegin and glEnd (if you have a significant number of vertices), is to pass all of your vertices with glVertexPointer (and friends), and then make one call to glDrawArrays or glDrawElements. This will send all your vertices to the GPU in one fell swoop, instead of incrementally by calling glVertex3f repeatedly.

Jesse Beder
+2  A: 

From a function call overhead perspective the second approach is more expensive. If instead of ten quads we used ten thousand. Then glBegin/glEnd would be called ten thousand times per frame instead of once.

More importantly glBegin/glEnd have been deprecated as of OpenGL 3.0, and are not supported by OpenGL ES.

Instead vertices are uploaded as vertex arrays using calls such as glDrawArrays. Tutorial and much more in depth information can be found on the NeHe site.

Adaptation
+1  A: 

I decided to go ahead and benchmark it using a loop of 10,000 quads.

The results:

Method 1: 0.0128 seconds

Method 2: 0.0132 seconds

Method #1 does have some improvement, but the improvement is very marginal (3%). It's probably nothing more than the overhead of simply calling more functions. So it's likely that OpenGL itself doesn't get any additional optimization from Method #1.

This is on Windows XP service pack 3 using OpenGL 2.0 and visual studio 2005.

Kevin Laity
How did you benchmark it? It's very hard to benchmark OpenGL calls - just setting a timer before and after the calls doesn't necessarily take into account sending the data to the GPU, any computation on the GPU, and rendering.
Jesse Beder
In this case I'm only concerned with how much time is taken hanging the thread that I'm on. If my timer gets tripped after the code is called, then from my perspective that's how long OpenGL is taking. On the previous engine I worked with, we were dealing with a million little textures, which meant a million texture calls, which meant slowdown of our games. I just want to make sure we're not wasting OpenGL's time, I'm not concerned as much with how it's performing internally, because I can't change that!
Kevin Laity
And the assumption I'm making is that once I'm done talking to OpenGL, it's smart enough to optimize whatever I gave it for sending to the video card, is this a false assumption?
Kevin Laity
A: 

You would get better performance for sure in just how much code gets called by the CPU.

Whether or not your drawing performance would be better on the GPU, that would completely depend on the implementation of the driver for your 3d graphics card. You could get potentially wildly different results with a different manufacturer's driver and even with a different version of the driver for the same card.

Jim Buck