Using the RDTSC
instruction directly has some severe drawbacks:
- The TSC isn't guaranteed to be synchronized on all CPUs, so if your thread/process migrates from one CPU core to another the TSC may appear to "warp" forward or backward in time unless you use thread/process affinity to prevent migration.
- The TSC isn't guaranteed to advance at a constant rate, particularly on PCs that have power management or "C1 clock ramping" enabled. With multiple CPUs, this may increase the skew (for example, if you have one thread that is spinning and another that is sleeping, one TSC may advance faster than the other).
- Accessing the TSC directly doesn't allow you to take advantage of HPET.
Using an OS timer interface is better, but still may have some of the same drawbacks depending on the implementation:
Also note that Microsoft Visual C++ doesn't support inline assembly when targeting 64-bit processors, hence the __rdtsc()
intrinsic that Virne pointed out.