views:

151

answers:

4

Myself and another developer on my time recently moved from a Core 2 Duo machine at work to a new Core 2 Quad 9505; both running Windows XP SP3 32-bit with JDK 1.6.0_18.

Upon doing so, a couple of our automated unit tests for some timing/statistics/metrics aggregation code promptly started failing, due to what appear to be ridiculous values coming back from System.nanoTime().

Test code that shows this behaviour, reliably, on my machine is:

import static org.junit.Assert.assertThat;

import org.hamcrest.Matchers;
import org.junit.Test;

public class NanoTest {

  @Test
  public void testNanoTime() throws InterruptedException {
    final long sleepMillis = 5000;

    long nanosBefore = System.nanoTime();
    long millisBefore = System.currentTimeMillis();

    Thread.sleep(sleepMillis);

    long nanosTaken = System.nanoTime() - nanosBefore;
    long millisTaken = System.currentTimeMillis() - millisBefore;

    System.out.println("nanosTaken="+nanosTaken);
    System.out.println("millisTaken="+millisTaken);

    // Check it slept within 10% of requested time
    assertThat((double)millisTaken, Matchers.closeTo(sleepMillis, sleepMillis * 0.1));
    assertThat((double)nanosTaken, Matchers.closeTo(sleepMillis * 1000000, sleepMillis * 1000000 * 0.1));
  }

}

Typical output:

millisTaken=5001
nanosTaken=2243785148

Running it 100x yields nano results between 33% and 60% of the actual sleep time; usually around 40% though.

I understand the weaknesses in accuracy of timers in Windows, and have read related threads like Is System.nanoTime() consistent across threads?, however my understanding is that System.nanoTime() is intended for exactly the purpose we're using it :- measuring elapsed time; more accurately than currentTimeMillis().

Does anyone know why it's returning such crazy results? Is this likely to be a hardware architecture problem (the only major thing that has changed is the CPU/Motherboard on this machine)? A problem with the Windows HAL with my current hardware? A JDK problem? Should I abandon nanoTime()? Should I log a bug somewhere, or any suggestions on how I could investigate further?

UPDATE 19/07 03:15 UTC: After trying finnw's test case below I did some more Googling, coming across entries such as bugid:6440250. It also reminded me of some other strange behaviour I noticed late Friday where pings were coming back negative. So I added /usepmtimer to my boot.ini and now all the tests behave as expected., and my pings are normal too.

I'm a bit confused about why this was still an issue though; from my reading I thought TSC vs PMT issues were largely resolved in Windows XP SP3. Could it be because my machine was originally SP2, and was patched to SP3 rather than installed originally as SP3? I now also wonder whether I should be installing patches like the one at MS KB896256. Maybe I should take this up with the corporate desktop build team?

+1  A: 

You probably want to read the answers to this other stack overflow question: Is System.nanoTime() completely useless?.

In summary, it would appear that nanoTime relies on operating system timers that may be affected by the presence of multiple core CPUs. As such, nanoTime may not be that useful on certain combinations of OS and CPU, and care should be taken when using it in portable Java code that you intend to run on multiple target platforms. There seems to be a lot of complaining on the web on this subject, but not much consensus on a meaningful alternative.

Tom
This is not a completetly accurate summary. System.nanoTime depends on OS-specific timers. There have been one or two bugs in the past, e.g. on the Athlon 64 chips in Windows, however on most systems you can rely on nanoTime to work pretty well. I use it for animation and timing in multi-core games, and have never had any problems.
mikera
Thanks for the clarification mikera. I've updated my answer to (hopefully) improve the accuracy.
Tom
Thanks Tom. As I've mentioned in the updated question above, I managed to return to "normal" behaviour by forcing use of the PMT. I guess I still have niggling concerns about whether this will behave the way we expect across multiple cores though. And yes, without a meaningful alternative (short of "go back to currentTimeMillis") it's difficult to know how best to proceed!
Chad
+1  A: 

On my system (Windows 7 64-Bit, Core i7 980X):

nanosTaken=4999902563
millisTaken=5001

System.nanoTime() uses OS-specific calls, so I expect that you are seeing a bug in your Windows/processor combination.

mikera
Thanks mikera, it looks like Windows was using a timer style that does not behave correctly on my Core 2 Quad. Forcing it to use the Power Management Timer has made it behave normally again; but I don't quite understand why I had to do this!
Chad
+1  A: 

It is difficult to tell whether this is a bug or just normal timer variation between cores.

An experiment you could try is to use native calls to force the thread to run on a specific core.

Also, to rule out power management effects, try spinning in a loop as an alternative to sleep():

import com.sun.jna.Native;
import com.sun.jna.NativeLong;
import com.sun.jna.platform.win32.Kernel32;
import com.sun.jna.platform.win32.W32API;

public class AffinityTest {

    private static void testNanoTime(boolean sameCore, boolean spin)
    throws InterruptedException {
        W32API.HANDLE hThread = kernel.GetCurrentThread();
        final long sleepMillis = 5000;

        kernel.SetThreadAffinityMask(hThread, new NativeLong(1L));
        Thread.yield();
        long nanosBefore = System.nanoTime();
        long millisBefore = System.currentTimeMillis();

        kernel.SetThreadAffinityMask(hThread, new NativeLong(sameCore? 1L: 2L));
        if (spin) {
            Thread.yield();
            while (System.currentTimeMillis() - millisBefore < sleepMillis)
                ;
        } else {
            Thread.sleep(sleepMillis);
        }

        long nanosTaken = System.nanoTime() - nanosBefore;
        long millisTaken = System.currentTimeMillis() - millisBefore;

        System.out.println("nanosTaken="+nanosTaken);
        System.out.println("millisTaken="+millisTaken);
    }

    public static void main(String[] args) throws InterruptedException {
        System.out.println("Sleeping, different cores");
        testNanoTime(false, false);
        System.out.println("\nSleeping, same core");
        testNanoTime(true, false);
        System.out.println("\nSpinning, different cores");
        testNanoTime(false, true);
        System.out.println("\nSpinning, same core");
        testNanoTime(true, true);
    }

    private static final Kernel32Ex kernel =
        (Kernel32Ex) Native.loadLibrary(Kernel32Ex.class);

}

interface Kernel32Ex extends Kernel32 {
    NativeLong SetThreadAffinityMask(HANDLE hThread, NativeLong dwAffinityMask);
}

If you get very different results depending on core selection (e.g. 5000ms on the same core but 2200ms on different cores) that would suggest that the problem is just natural timer variation between cores.

If you get very different results from sleeping vs. spinning, it is more likely due to power management slowing down the clocks.

If none of the four results are close to 5000ms, then it might be a bug.

finnw
Thanks finnw, this is interesting. My results are: Sleeping, different cores nanosTaken=2049217124 millisTaken=4985 Sleeping, same core nanosTaken=1808868148 millisTaken=4985 Spinning, different cores nanosTaken=5015172794 millisTaken=5000 Spinning, same core nanosTaken=5015295288 millisTaken=5000Do you think this implies something broken with power management on my machine?
Chad
After doing some more reading, triggered by your test I tried rebooting my machine with /usepmtimer in boot.ini. Now your test, (and my original tests) behave normally.I've edited my question accordingly. Should I have to do this?
Chad
It is not necessarily "broken" but it is clear that the TSC is not suitable for high-precision timing on your machine and using the PM timer gives better results. I thought /usepmtimer was the default on XP SP3 but your results suggest otherwise.
finnw
Yes, well my machine was actually an SP2 machine recently patched to SP3 via some kind of corporate patching process, which may or may not have adulterated the standard MS SP3 distribution; or perhaps when SP3 is applied as a patch the boot.ini is not changed. Not sure.
Chad
A: 

The problem was resolved (with some open suspicions about the suitability of nanoTime() on multi-core systems!) by adding /usepmtimer to the end of my C:\boot.ini string; forcing Windows to use the Power Management timer rather than the TSC. It's an open question as to why I needed to do this given I'm on XP SP3, as I understood that this was the default, however perhaps it was due to the manner in which my machine was patched to SP3.

Chad