tags:

views:

61

answers:

2

I'm developing some C++ code that can do some fancy 3D transition effects between two images, for which I thought OpenGL would be the best option.

I start with a DIB section and set it up for OpenGL, and I create two textures from input images.

Then for each frame I draw just two OpenGL quads, with the corresponding image texture. The DIB content is then saved to file.

For example one effect is to locate the two quads (in 3d space) like two billboards, one in front of the other(obscuring it), and then swoop the camera up, forward and down so you can see the second one.

My input images are 1024x768 or so and it takes a really long time to render (100 milliseconds) when the quads cover most of the view. It speeds up if the camera is far away.

I tried rendering each image quad as hundreds of individual tiles, but it takes just the same time, it seems like it depends on the number of visible textured pixels.

I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?

Would I be better off using some other approach?

Thanks in advance...

Edit :

The GL strings show up for the DIB version as :

Vendor : Microsoft Corporation Version: 1.1.0 Renderer : GDI Generic

The Onscreen version shows : Vendor : ATI Technologies Inc. Version : 3.2.9756 Compatibility Profile Context Renderer : ATI Mobility Radeon HD 3400 Series

So I guess I'll have to use FBO's , I'm a bit confused as to how to get the rendered data out from the FBO onto a DIB, any pointers (pun intended) on that?

+4  A: 

It sounds like rendering to a DIB is forcing the rendering to happen in software. I'd render to a frame buffer object, and then extract the data from the generated texture. Gamedev.net has a pretty decent tutorial.

Keep in mind, however, that graphics hardware is oriented primarily toward drawing on the screen. Capturing rendered data will usually be slower that displaying it, even when you do get the hardware to do the rendering -- though it should still be quite a bit faster than software rendering.

Edit: Dominik Göddeke has a tutorial that includes code for reading back texture data to CPU address space.

Jerry Coffin
Offscreen FBOs should not be noticeably slower than onscreen buffers on any reasonably recent graphics hardware.
Carlos Scheidegger
@Carlos: Sorry, poor wording on my part. Rendering to the FBO isn't (normally) any slower -- but reading the data from the texture afterwards is usually fairly slow.
Jerry Coffin
@Jerry: you're right if by "reading" you mean a CPU readback. But if all you want is to use those FBOs as GPU textures for later rendering, the performance hit should be negligible (in particular if one uses the same FBO with a larger texture and viewport tricks)
Carlos Scheidegger
@Carlos: right -- when he said "The DIB content is then saved to file.", I assumed he meant CPU readback followed by the CPU writing the data to the file.
Jerry Coffin
@Carlos : I do need to get the rendered bits for every frame. This is part of a module that my client will be integrating into some sort of video application, so in my code, I just get 2 DIBs in and return 1 DIB with the contents. I think I will raise the point with my client that performance really won't matter for this bit of code, if the intent is to encode it into a video, Even simply dumping to disk as BMPs is much much slower than the rendering process.
rep_movsd
@Carlos : Before I try FBO's, Would the speed to copy data from texture memory back to RAM be similar to the BitBlt() speed of onscreen contents to a DIB section?
rep_movsd
@rep_movsd: "I do need to get the rendered bits for every frame." Bad idea. Seriously. It isn't suprprising that you have performance problems - 3D apis aren't very good when you frequently exchange data between GPU and system memory. Both Direct3D and OpenGL aren't really created to "load data on card, render something, and get picture back every frame". In this case you could even try to write software rasterizer. Shouldn't be hard...
SigTerm
@Jerry: you're right, I missed that sentence and only read the following one about the billboards. Sorry.
Carlos Scheidegger
@SigTerm : Yeah I realize now that it's not the way 3d APIs and the underlying hardware is meant to work. The time and money constraints for this project do not warrant trying to write a software rasterizer and I doubt if I could come up with something that performs any better - OpenGL makes it much easier to program. I'm going to stick to it, my worst case is only 2 images worth of textures, so I'll leave it at that.
rep_movsd
+2  A: 

One problem with your question:
You provided no actual rendering/texture generation code.

Would I be better off using some other approach?

The simplest thing you can do is to make sure your textures have sizes equal to power of two. I.e. instead of 1024x768 use 1024x1024, and use only part of that texture. Explanation: although most of modern hardware supports non-pow2 textures, they are sometimes treated as "special case", and using such texture MAY produce performance drop on some hardware.

I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?

Yes, you're missing one important thing. There are few things that limit GPU performance:
1. System memory to video memory transfer rate (probably not your case - only for dynamic textures\geometry when data changes every frame).
2. Computation cost. (If you write a shader with heavy computations, it will be slow).
3. Fill rate (how many pixels program can put on screen per second), AFAIK depends on memory speed on modern GPUs.
4. Vertex processing rate (not your case) - how many vertices GPU can process per second.
5. Texture read rate (how many texels per second GPU can read), on modern GPUs depends on GPU memory speed.
6. Texture read caching (not your case) - i.e. in fragment shader you can read texture few hundreds times per pixel with little performance drop IF coordinates are very close to each other (i.e. almost same texel in each read) - because results are cached. But performance will drop significantly if you'll try to access 100 randomly located texels for every pixels.

All those characteristics are hardware dependent.

I.e., depending on some hardware you may be able to render 1500000 polygons per frame (if they take a small amount of screen space), but you can bring fps to knees with 100 polygons if each polygon fills entire screen, uses alpha-blending and is textured with a highly-detailed texture.

If you think about it, you may notice that there are a lot of videocards that can draw a landscape, but fps drops when you're doing framebuffer effects (like blur, HDR, etc).

Also, you may get performance drop with textured surfaces if you have built-in GPU. When I fried PCIEE slot on previous motherboard, I had to work with built-in GPU (NVidia 6800 or something). Results weren't pleasant. While GPU supported shader model 3.0 and could use relatively computationally expensive shaders, fps rapidly dropped each time when there was a textured object on screen. Obviously happened because built-in GPU used part of system memory as video memory, and transfer rates in "normal" GPU memory and system memory are different.

SigTerm
You don't need to resize the texture. The power-of-2 only really applies to the width, the height is irrelevant since the texel is addressed as (x + y * width), if width is power-of-2 then it becomes (x + (y << width_shift)).
Skizz
@Skizz: "The power-of-2 only really applies to the width, the height is irrelevant" It is the first time I hear something like this (applied to actual hardware). So, can you support your argument with words from actual GPU manufacturerer, Microsoft or from openGL documentation? Don't forget that texture coordinates wrap around, and that warpping around is easier to do for pow2 coordinates.
SigTerm
Yes I tried various texture flags and it does make a difference of about 1.5x to 2x for rendering - using glTexImage2D rather than gluBuild2DMipmaps.Moreover gluBuild2DMipmaps is very costly to call for every frame (since my input images are dynamically changing). Maybe because it creates several mipmaps by scaling the image?glTexImage2D is about 30 times faster, though I am aware the results will not look as good as with gluBuild2DMipmaps, I can live with that.
rep_movsd
@rep_movsd: OpenGL supports automatic mipmap generation starting from version 1.4. "glTexParameteri(GL_TEXTURE_2D, GL_GENERATE_MIPMAP, GL_TRUE);". See http://www.opengl.org/sdk/docs/man/xhtml/glTexParameter.xml
SigTerm
@SigTerm: I might have been thinking a bit old-school there. Anyway, thinking about it, don't modern GPUs use floating point texel coordinates where (0,0) is the top left and (1,1) is the bottom right and then scale the value to the texture size? In which case, it probably makes little difference what the size is. The only caveat would be that bigger textures require more memory bandwidth and more RAM pages, which means slower rendering.
Skizz
@Skizz: "it probably makes little difference what the size is" the problem is with "probably" - there is uncertainty. Modern GPU is a "black box". You feed data as floats, and you "see" some data as floats within shader code. But it is unknown how exactly the data is represented internally, and how exactly it calculates texel coordinates. Anyway, float or not, I think that multiplication by power of two will be faster. Also once or twice I had reports of performance drops with non-pow2 textures on ati hardware.
SigTerm
@Skizz: " bigger textures require more memory bandwidth and more RAM pages, which means slower rendering" That is incorrect. While more bandwidth will slow down rendering, more RAM pages doesn't affect rendering speed at all. In game development, it is commonly recommended to use large texture "atlases" - huge textures (4096x4096 or larger) that contain multiple smaller textures (say, 1024 128x128 textures). The purpose of atlases is to avoid calling CPU-related functions and make object "batches" (to speed things up). ...
SigTerm
@Skizz: .. rendering speed depends on how many texels you're reading, it doesn't depend on texture size. Plus, even with huge textures, there is a mipmapping, which significantly speeds things up by reducing numbers of read texels. Also, texture reads are cached. As a result, you'll get slower rendering speed if (in fragment shader) you're accessing 100 texels randomly spread across the texture, then when you're accessing 100 texels that are very close to each other. Anyway, it is a broad topic, and I think that assuming anything (that isn't documented) about actual GPU isn't a good idea.
SigTerm