I'm writing some micro-benchmarking code for some very short operations in C. For example, one thing I'm measuring is how many cycles are needed to call an empty function depending on the number of arguments passed.
Currently, I'm timing using an RDTSC instruction before and after each operation to get the CPU's cycle count. However, I'm concerned that instructions issued before the first RDTSC may slow down the actual instructions I'm measuring. I'm also worried that the full operation may not be complete before the second RDTSC gets issued.
Does anyone know of an x86 instruction that forces all in-flight instructions to commit before any new instructions are issued? I've been told CPUID might do this, but I've been unable to find any documentation that says so.