views:

582

answers:

3

I hacked a following code:

unsigned long long get_cc_time () volatile {
  uint64 ret;
  __asm__ __volatile__("rdtsc" : "=A" (ret) : :);
  return ret;
}

It works on g++ but not on Visual Studio. How can I port it ? What are the right macros to detect VS / g++ ?

+2  A: 

There's a _MSC_VER macro in VC++ that is described as "Microsoft specific" in MSDN and presumably is not defined when code is compiled on other compilers. You can use #ifdef to determine what compiler it is and compile different code for gcc and VC++.

#ifdef _MSC_VER
    //VC++ version
#else
    //gcc version
#endif
sharptooth
+3  A: 
#if defined(_MSC_VER)
// visual c
#elif defined(__GCCE__)
// gcce
#else
// unknown
#endif

My inline assembler skills are rusty, but it works like:

__asm
{
// some assembler code
}

But to just use rdtsc you can just use intrinsics:

unsigned __int64 counter;
counter = __rdtsc();

http://msdn.microsoft.com/en-us/library/twchhe95.aspx

Virne
Thanks!Is there linux variant for intrinsics rdtsc?
Łukasz Lew
+2  A: 

Using the RDTSC instruction directly has some severe drawbacks:

  • The TSC isn't guaranteed to be synchronized on all CPUs, so if your thread/process migrates from one CPU core to another the TSC may appear to "warp" forward or backward in time unless you use thread/process affinity to prevent migration.
  • The TSC isn't guaranteed to advance at a constant rate, particularly on PCs that have power management or "C1 clock ramping" enabled. With multiple CPUs, this may increase the skew (for example, if you have one thread that is spinning and another that is sleeping, one TSC may advance faster than the other).
  • Accessing the TSC directly doesn't allow you to take advantage of HPET.

Using an OS timer interface is better, but still may have some of the same drawbacks depending on the implementation:

Also note that Microsoft Visual C++ doesn't support inline assembly when targeting 64-bit processors, hence the __rdtsc() intrinsic that Virne pointed out.

bk1e
Or even better use a platform independent library component like http://www.dre.vanderbilt.edu/Doxygen/Stable/ace/classACE__High__Res__Timer.html
lothar
TSC has its drawbacks, as described, but it has its advantages too. It is extremely fast (20-30 clockticks), whereas all other mechanisms such as HPET involve travelling into ring 0 and therefore cost 1000 clockticks or more. It is precise, whereas standard OS tools often offer the granularity of 10 ms. HPET is not available on many systems, and when it is, it may only be accessible to the superuser. Don't ask me why - just go find a nearest Linux box and check privileges on /dev/hpet.
As to synchronization, it's typically synchronized across cores on desktop Intels (not sure about mobile Intels), and, on AMDs, you can restrict migration across cores by modifying the processor affinity of your thread.