views:

630

answers:

1

I'm writing some micro-benchmarking code for some very short operations in C. For example, one thing I'm measuring is how many cycles are needed to call an empty function depending on the number of arguments passed.

Currently, I'm timing using an RDTSC instruction before and after each operation to get the CPU's cycle count. However, I'm concerned that instructions issued before the first RDTSC may slow down the actual instructions I'm measuring. I'm also worried that the full operation may not be complete before the second RDTSC gets issued.

Does anyone know of an x86 instruction that forces all in-flight instructions to commit before any new instructions are issued? I've been told CPUID might do this, but I've been unable to find any documentation that says so.

+6  A: 

To my knowledge, there is no instruction which specifically "drains" the pipeline. This can be easily accomplished though using a serialising instruction.

CPUID is a serializing instruction, which means exactly what you're looking for. Every instruction issues before it is guaranteed to execute before the CPUID instruction.

So doing the following should get the desired effect:

cpuid
rdtsc
# stuff
cpuid
rdtsc

But, as an aside, I don't recommend that you do this. Your "stuff" can still be effected by a lot of other things outside of your control (such as CPU caches, other processes running on the system, etc), and you'll never be able to eliminate them all. The best way to get accurate performance statistics is to perform the operation(s) you want to measure at least several million times and average out the execution time of the batch.

Edit: Most instruction references for CPUID will mention its serializing properties, such as the NASM manual appendix B .

Edit 2: Also might want to take a look at this related question.

SoapBox
You're right about the Fence instructions. CPUID does in fact imply them (I deleted my answer about that). Note that while CPUID serializes, it doesn't clear the caches, which may also affect performance. The cache can be cleared with WBINVD. You might want to add that to your answer.
Nathan Fellman
wbinvd is privileged... you could use clflush though. There's a seperate question about that sort of thing from last week.... http://stackoverflow.com/questions/558848/can-i-force-cache-coherency-on-a-multicore-x86-cpu/558900#558900
SoapBox
Great answer. Thank you.
Jay Conrod