ansaurus

Question

When are VBOs faster than "simple" OpenGL primitives (glBegin())?

Answer 1

+8 A:

there are a lot of factors to optimizing 3d rendering. usually there are 4 bottlnecks:

cpu (creating vertices, api calls, everything else)
bus (cpu-gpu transfer)
vertex (vertex shader oder fixed function pipeline execution)
pixel (fill, fragment shader execution and rops)

your test is giving skewed results because you have a lot of cpu (and bus) while maxing out vertex or pixel throughput. vbos are used to lower cpu (fewer api calls, parallel to cpu dma transfers). since you are not cpu bound, they don't give you any gain. this is optimization 101. in a game for example cpu becomes precious as it is needed for other things like ai and physics, not just for issuing tons of api calls. it is easy to see that writing vertex data (3 floats for example) directly to a memory pointer is much faster than calling a function that writes 3 floats to memory - at the very least you save the cycles for the call.

starmole 2009-01-10 05:46:40

My understanding was that Vertex Arrays (GL 1.1) were used to lower CPU (minimize function calls), while VBOs built on that to also knock down bus activity. I thought that my experiment would be bus bound (or CPU bound for simple glBegin() drawing) but I guess I was wrong. Can you comment? Thks!

Drew Hall 2009-01-10 18:40:06

vbos will only lower bus activity if your geometry is static, that is reused across frames. also make sure that you flag them write only in that case. further reading: http://developer.nvidia.com/object/using_VBOs.html

starmole 2009-01-14 04:38:19

Drew Hall 2009-01-14 04:52:15

@starmole: I tried mapping as WRITE_ONLY (actually, tried all three modes) with no effect. Did a quick map/unmap immediately after glBufferData--did I do that right? Still confused... :(

Drew Hall 2009-01-14 05:16:34

Answer 2

+1 A:

As a side note:
The "direct mode" (glBegin/glEnd) is not supported in:

OpenGLES.

OpenGL 3.x.

So if you ever plan to port your application to a mobile platform (e.g. iPhone), don't even get used to it.

I teach OpenGL at University and the slide explaining glBegin/glEnd has a big red box around it with an extra bold "DO NOT USE" header.

Using vertex arrays is just two lines more and you save cycles right from the start.

Andreas 2009-01-13 18:53:35

Thanks. I included the direct mode calls just so I could see the VBO performance improvement in all its glory (best vs. worst performance).

Drew Hall 2009-01-14 03:07:47

Answer 3

+1 A:

From reading the Red Book, I remember a passage that stated that VBOs are possibly faster depending on the hardware. Some hardware optimizes those, while others don't. It's possible that your hardware doesn't.

Will Mc 2009-01-13 19:21:41

Thanks. It's hard to see how keeping the data resident on the card wouldn't always be faster (even without significant extra optimization in the driver), but I guess I'm having trouble getting my code to be "bus bound".

Drew Hall 2009-01-14 03:11:53

@Will Mc (again): Also hard to imagine that Nvidia wouldn't be somewhere near the cutting edge in terms of implementing VBO optimizations. Seems more likely that they've (also) found a way to optimize the direct path to me.

Drew Hall 2009-01-14 03:15:55

Answer 4

A:

assuming i remeber this right my open gl teacher wich is really famous on the open gl community said they are faster on static geometry wich is going to be render a lot of time's on a tipical game this will be tables chair and small static entitys.

2009-01-13 19:24:39

Answer 5

+1 A:

There might be a few things missing:

1) It's a wild guess, but your laptop's card might be missing this kind of operation at all (i.e. emulating it).

2) Are you copying the data to GPU's memory (via glBufferData(GL _ ARRAY _ BUFFER [stackoverflow breaks with underscores] with either GL _ STATIC _ DRAW or GL _ DYNAMIC _ DRAW param) or are you using pointer to main (non GPU) array in memory? (that requires copying it every frame and therefore perfomance is slow)

3) Are you passing indices as another buffer sent via glBufferData and GL _ ELEMENT _ ARRAY _ BUFFER params?

If those three things are done, the performance gain is big. For Python (v/pyOpenGl) it's about 1000 times faster on arrays bigger than a couple 100 elemnts, C++ up to 5 times faster, but on arrays 50k-10m vertices.

Here are my test results for c++ (Core2Duo/8600GTS):

 pts   vbo glb/e  ratio
 100  3900  3900   1.00
  1k  3800  3200   1.18
 10k  3600  2700   1.33
100k  1500   400   3.75
  1m   213    49   4.34
 10m    24     5   4.80

So even with 10m vertices it was normal framerate while with glB/e it was sluggish.

Slava N 2009-02-14 15:14:46

Answer 6

A:

14Mpoints/s is not a whole lot. It's suspect. can we see the complete code doing the drawing, as well as the initialisation ? (compare that 14M/s to the 240M/s (!) that Slava Vishnyakov gets). It's even more suspicious that it drops to 640K/s for 1K draws (compared with his 3.8M/s, that looks capped by the ~3800 SwapBuffers, anyways).

I'd be beting the test does not measure what you think it measures.

Bahbar 2009-11-19 21:10:05

ansaurus

tags:

views:

answers:

When are VBOs faster than "simple" OpenGL primitives (glBegin())?

Experiment Details

Performance Results ##

Question(s)

related questions