How credible are the benchmarks carried out within a virtual machine, as opposed to real hardware?
Let's dissect a specific situation. Assume we want to benchmark the performance impact of a recent code change. Assume for simplicity that the workload is fully CPU bound (though IO bound and mixed workloads are also of interest). Assume that the machine is running under VirtualBox because it's the best one ;)
Assume that we measured the original code and the new code, and the new code was 5% faster (when benchmarked in virtual machine). Can we safely claim that it will be at least 5% faster on real hardware too?
And even more important part, assume that the new code is 3% slower. Can we be completely sure that on real hardware it will be 3% or less slower, but definitely not worse than 3%?
UPDATE: what I'm most interested in is your battlefield results. Ie. can you witness a case when code that was 10% slower in VM performed 5% faster on real iron, or vice versa? Or was it always consistent (ie. if it's faster/slower in VM, it's always proportionally faster/slower on real machine)? Mine are more or less consistent so far; at the very least, always going in the same direction.