This MSR (Microsoft Research) paper is a good start, they also document a number of point tools like, IOSpeed, FragDisk, etc... which you can use and test in your envrionment.
There is also an updated report/presentation you can read about how to maximise sequential IO. Very interesting stuff as they debunk, the "moving the HD head is the most time consuming operation" myth, they also document fully their test envrionments and associated configurations, down to the motherboard, raid controller and virtually any relivent information for you to replicate their work. Some of the highlights are how an Opteron / XEON matched up, but they then also compared them to an insane\hype NEC Itanium (32 or 64 proc or something) for measure. From the second link here you can find a lot more resources around how to test and evaluate high-throughput scenerio's and needs.
Some of the other MSR paper's in this same research topic involve guidieance about where to maximise your spending, (e.g. RAM, CPU, Disk Spindals... etc..) to accomidate your usage patterns... all very neat.
However some of it is dated, but usually older-API's are the faster/low-level ones anyhow ;)
I currently push hundreds of thousands of TPS on a purpose built app server, using a mix of C#, C++/CLI, native code and bitmap Caching (rtl*bitmap).
Take care;