views:

220

answers:

3

I am sending network packets from one thread and receiving replies on a 2nd thread that runs on a different CPU core. My process measures the time between send & receive of each packet (similar to ping). I am using rdtsc for getting high-resolution, low-overhead timing, which is needed by my implementation.

All measurments looks reliable. Still, I am worried about rdtsc accuracy across cores, since I've been reading some texts which implied that tsc is not synced between cores.

I found the following info about TSC in wikipedia

Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward for all Intel processors.

Still I am worried about accruracy across cores, and this is my question

More Info

  • I run my process on an Intel nehalem machine.
  • Operating System is Linux.
  • The "constant_tsc" cpu flag is set for all the cores.
A: 

I recommend that you don't use rdtsc. Not only is it not portable, it's not reliable and generally won't work - on some systems the rdtsc does not update uniformly (like if you're using speedstep etc). If you want accurate timing information you should set the SO_TIMESTAMP option on the socket and use recvmsg() to get the message with a (microsecond resolution) timestamp.

Moreover, the timestamp you get with SO_TIMESTAMP actually IS the time the kernel got the packet, not when your task happened to notice.

MarkR
Thanks for the answer. Notice that with constant_tsc flag, rdtsc does update uniformly; see the quote I added to my question.SO_TIMESTAMP is at msec percision, while rdtsc is at nsec percision and this is the percision I need.I am not interested in the time the packet arrived to the kernel, but in the time the user got it, since this is the part my application accelerates.
avner
+2  A: 

On linux you can use clock_gettime(3) with CLOCK_MONOTONIC_RAW, which gives you nanoseconds resulotion and in not subject to ntp updates (if any happened).

nir
Thanks, still not good.CLOCK_MONOTONIC_RAW is undefined in my environment. I do have CLOCK_MONOTONIC in time.h which I already tried.It is true that struct timespec has nanoseconds resolution; still when calling clock_gettime for CLOCK_MONOTONIC the last 3 digits has always same value; hence, practically it is only microseconds resolution.
avner
Prehaps your system does not support high resolution timers? What do you get when you run this code:#include <time.h>#include <stdio.h>#include <stdlib.h>int main(){ while (1){ struct timespec ts; clock_gettime(CLOCK_MONOTONIC, printf("%ld %ld\n", ts.tv_sec, ts.tv_nsec); sleep(1); } return 0;}
nir
1) bellow are the 1st 5 lines that are outputed by your code (you can see that nsec is always 246)279595 629885246279596 630958246279597 631777246279598 6335962462) Additional problem with clock_gettime is its overhead. According to my statiscs (taking clock 1001 times repeatidly without sleep), the average overhead of clock_gettime(CLOCK_MONOTONIC, while taking rdtsc consumes only 8 nsec on same machine.
avner
This strongly suggests that you do not have high resolution timers enabled. Try your distor's documentation for this. As for the overhead, there is really not much you can do about that...
nir
My question was about rdtsc, because I need high resolution and low overhead. I only want to make sure its reliablity across cores, since I couldn't find documentation about it. Though, my feeling is good. Beside, In order to use high resolution timer (probably, CLOCK_MONOTONIC_HR), I'll need to recompile the kernel. This is not an option, since I can't require that from all my customers.
avner
Then just make sure to disable cpu throttling and set your affinity to a specific cpu.
nir
I already set cpu affinity :) My question is about 2 threads on 2 different CPU cores. My recv thread polls the NIC for packets without context switches and without delay for sending outbound packets. In my environment microseconds count a lot!
avner
A: 

You can set thread affinity using sched_set_affinity() API in order to run your thread on one CPU core.

Dima
I already set cpu affinity :( My question is about 2 threads on 2 different CPU cores. My recv thread polls the NIC for packets without context switches and without delay for sending outbound packets. In my environment microseconds count a lot!
avner
you cannot do it across cores... try HPET: http://en.wikipedia.org/wiki/HPET
Dima
HEPT sounds bad for my needs - see my previous comment about HEPT. RDTSC looks great and reliable in hunders of tests I did in multiple core environment (even for machines that were up for many weeks). In addition, please read the citation about "TSC as a wall clock timer" in my query. Bottom line, I only look for formal approval. Practically, RDTSC does seem to do the work.
avner
The drift between cores may occurs (by hundreds of milliseconds).
Dima