views:

26

answers:

2

Hi,

I working on some applications that route tcp and upd traffic through multiple machines and I'm trying to timestamp (millisecond resolution) the packets on every node to receive a full picture of latency through each node.

Before the tests I made sure that all windows machines are NTP synced, however my issue seems to be that the time on all machines is never really fully in sync and system times seem to vary +-500ms. For example, sometimes my timestamp logs show that a packet was received at node2 500ms before it was sent from node1.

After a NTP resync of all machines it will always show a different latency between node1 and node2 but the time differences are always constant until the next NTP sync.

What accuracy can I expect when syncing multiple machines to the same NTP server? Is there any reason why I'm seeing these discrepancies and how do other applications measure latency through a system that involves multiple machines?

Thanks,

Tom

+1  A: 

Unless you run a NTP server on the local LAN, chances are that it could be off by as much as a few seconds due to network latency.

I think it would be quite hard to get millisecond synchronization without using hardware triggers.

Edit:

A better idea would be for all the nodes to synchronize times with each other. That way you can continually calibrate it.

Edit 2:

Also note that not all things equal are equal. There could be variations between 2 systems' timing clocks whether it is the CPU clock or the RTC, due to any number of factors eg temperature and age of the oscillators, oh and of course the lack of a realtime OS ;P

leppie
Do you have some more information on how to synchronize windows machine times with each other?I am suprised though that what I'm seeing would be caused by network latency only. Those servers are sitting at a DC in New York and a trace to the NTP server I was using shows it as being only 8ms away
Tom Frey
@Tom Frey: Not offhand, but I would suggest looking into multiprocessor stuff/software (MPI).
leppie
btw. all servers are exact same model/build and age
Tom Frey
@Tom Frey: It could be caused by anything really, the OS most probably :)
leppie
A: 

Measure the latency between node1 and node2 by sending two-way messages. First send something from node1 to node2 and let node2 return something immediately to node1. On node1 you can now easily measure the total time elapsed for sending a message to node2 and back without having to worry about syncing times on different computers.

Peladao
I'm not really concerned about latency between the nodes, my concern here is flow through time through multiple nodes (app logic latency) with an accurate time representation at what time the packet hit a certain node and how much time it spent in each node.
Tom Frey