Hi there.
I have a running 3D engine built in D3D (via SlimDX). To avoid interupting the rendering pipeline I have batched together many objects with the same material into bigger meshes (to reduce state switching). This have worked well and gives a good performance level for my need.
The issue I am running into is that I, during runtime, need to change material properties on some subsets of those larger batched meshes. I'm using the attribute buffer for that and it have been working reasonably well. I have been using a limited number of active attributes earlier (ca 5 per mesh), but now find the need to have much more variations on the materials (different shades of opacity/colorblends) and thus ending up with possibly hundred or more combinations. And since those changes happens during runtime I can't bundle them together before rendering starts. Sure I could re-construct meshes, but I rather not since it is slow and switching back and fourth between materials needs to be done at interactive speeds.
So my question is, what is the best route to take?
Should i implement a more robust attribute handling system that dynamically masks faces with available attribute IDs on demand and then resets them when done? I have heard that fragmentation in the attribute buffer generates added performance hit and I am also unsure about the performance hit of subsequent DrawSubset() calls with material switches in between (i.e when is too much and when should i optimize my attribute arrays?). Anyone with any experience on this?
My other idea is to use a parametrized pixel shader. I don't need any fancy effects, just the bare minimum (current is the built in flat-shader with color only and transparency on some objects), so shader model 1 is more than enough for my needs. The idea here is to use one all-purpose shader and instead of switching material between calls just alter some shader parameters. But I don't know if this is faster than switching materials and/or if programmable shaders are slower than the build in ones (given the same result).
I'm also curious about the difference in performance hit between switching mesh or drawing different subsets in one big mesh (given the same number of material switches for both cases).
I understand that these questions might differ some between GFX-cards and their respective performance/age but I'm just looking for general guidelines here on what to focus most effort on (i.e what type of state switches/CPU-interference that gives the biggest GPU-hit). Memory is a concern also, so any implementations that duplicates whole (or large parts) of meshes are not possible for me.
My focus is performance on older(5y)/less capable/integrated GFX cards and not necessarily top of the line gamer cards or work station cards (like Quadro). Which I guess could make or break the solution using shaders depending on how good the shader performance is on a particular board.
Any and all suggestions and feedback are greatly appreciated.
Many thanks in advance!