tags:

views:

402

answers:

1

I am developing an application that needs to read back the whole frame from the front buffer of an openGL application. I can hijack the application's opengl library and insert my code on swapbuffers. At the moment I am successfully using a simple but excruciating slow glReadPixels command without PBO's.

Now I read about using multiple PBO's to speed things up. While I think I've found enough resources to actually program that (isn't that hard), I have some operational questions left. I would do something like this:

  1. create a series (e.g. 3) of PBO's
  2. use glReadPixels in my swapBuffers override to read data from front buffer to a PBO (should be fast and non-blocking, right?)
  3. Create a seperate thread to call glMapBufferARB, once per PBO after a glReadPixels, because this will block until the pixels are in client memory.
  4. Process the data from step 3.

Now my main concern is of course in steps 2 and 3. I read about glReadPixels used on PBO's being non-blocking, will this be an issue if I issue new opengl commands after that very fast? Will those opengl commands block? Or will they continue (my guess), and if so, I guess only swapbuffers can be a problem, will this one stall or will glReadPixels from front buffer be many times faster than swapping (about each 15->30ms) or, worst case scenario, will swapbuffers be executed while glReadPixels is still reading data to the PBO? My current guess is this logic will do something like this: copy FRONT_BUFFER -> generic place in VRAM, copy VRAM->RAM. But I have no idea which of those 2 is the real bottleneck and more, what the influence on the normal opengl command stream is.

Then in step 3. Is it wise to do this asynchronously in a thread separated from normal opengl logic? At the moment I think not, It seems you have to restore buffer operations to normal after doing this and I can't install synchronization objects in the original code to temporarily block those. So I think my best option is to define a certain swapbuffer delay before reading them out, so e.g. calling glReadPixels on PBO i%3 and glMapBufferARB on PBO (i+2)%3 in the same thread, resulting in a delay of 2 frames. Also, when I call glMapBufferARB to use data in client memory, will this be the bottleneck or will glReadPixels (asynchronously) be the bottleneck?

And finally, if you have some better ideas to speed up frame readback from GPU in opengl, please tell me, because this is a painful bottleneck in my current system.

I hope my question is clear enough, I know the answer will probably also be somewhere on the internet but I mostly came up with results that used PBO's to keep buffers in video memory and do processing there. I really need to read back the front buffer to RAM and I do not find any clear explanations about performance in that case (which I need, I cannot rely on "it's faster", I need to explain why it's faster).

Thank you

+1  A: 

Are you sure you want to read from the front buffer? You do not own this buffer, and depending on your OS it might be destroyed, e.g., by another window on top of it.

For your use case, people typically do

  • draw N
  • start PBO read N from back buffer
  • draw N+1
  • start PBO read N+1
  • sync PBO read N
  • process N
  • ...

from a single thread.

eile
I do know I want to read from the front buffer (or from the back buffer just before the call), this isn't actually the problem. Your answer helps me with my question about threading, which I actually already kinda suspected. But I really would like to know what happens behind the screens on the GPU when you do this, what gets blocked, what doesn't, is it reasonable to try to use more then two PBO's, ... . Thanks for the answer already anyway :)!
KillianDS
Using the back buffer is the better option.What goes on behind the scenes is driver-dependent. Typically the async read gets posted to the GPU fifo, where it will be processed when it is its turn by the GPU. Upon completion, the GPU will send back a token telling the driver the operation has finished - which will unblock the mapBuffers. For anything more specific, you'll have to talk to an nVidia/AMD engineer.It might make sense to use more than two buffers, depending on how many frames you want to have 'in flight'. Simply implement it configurable, and benchmark it.
eile