This is not really an answer to you question, more of a comment on SLI.
My understanding is that SLI is only really a cost-effective means of gaining performance when you buy two cards right away, which few people actually do. Many people buy an SLI motherboard and card thinking it will give them a better upgrade path down the road, but the reality is that by the time you get to that point, it is going to be cheaper to buy a new, faster card, than it is to duplicate the one you already have just to get SLI going.
Just a thought before you pour too much energy into it. If you have a requirement to support SLI, then that's what you have to do. But personally, I would rather see optimization energy put towards non-SLI implementations.