views:

199

answers:

5

Update: fixed delta calculations in code, still the issue remains

Folks, could you please explain why I'm getting very strange results from time to time using the the following code:

#include <unistd.h>
#include <sys/time.h>
#include <stdio.h>

int main()
{
  struct timeval start, end;
  long mtime1, mtime2, diff;

  while(1)
  {
    gettimeofday(&start, NULL);

    usleep(2000);

    gettimeofday(&end, NULL);

    mtime1 = (start.tv_sec * 1000 + start.tv_usec/1000.0);
    mtime2 = (end.tv_sec * 1000 + end.tv_usec/1000.0);

    diff = mtime2 - mtime1;
    if(diff > 10) 
        printf("WTF: %ld\n", diff);
  }

  return 0;
}

(You can compile and run it with: gcc test.c -o out -lrt && ./out)

What I'm experiencing is sporadic big values of diff variable almost every second or even more often, e.g:

$ gcc test.c -o out -lrt && ./out 
WTF: 14
WTF: 11
WTF: 11
WTF: 11
WTF: 14
WTF: 13
WTF: 13
WTF: 11
WTF: 16

How can this be possible? Is it OS to blame? Does it do too much context switching? But my box is idle( load average: 0.02, 0.02, 0.3).

Here is my Linux kernel version:

$ uname -a
Linux kurluka 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:56 UTC 2010 i686 GNU/Linux
A: 

Your formula is wrong. You have to convert both times in the same scale. In your example ms.

double mtime1 = (start.tv_sec * 1000 + start.tv_usec/1000.0) ;
double mtime2 = (end.tv_sec * 1000 + end.tv_usec/1000.0) ;

double diff = mtime2 - mtime1;
if(diff > 10) 
  printf("WTF: %ld\n", diff);

You have to substract the corrected values

Example: t1 = 1.999999 t2 = 2.000001 so an interval of 2 µs

With your formula you calculate:

2 - 1 == 1 and 1 - 9999999 giving a result of (1 * 1000 - 999998 / 1000) + 0.5 == 0.502 which is obviously false.

My method gives:

mtime1 = (1 * 1000 + 999999 / 1000) = 1999.999
mtime2 = (2 * 1000 +      1 / 1000) = 2000.001

2000.001 - 1999.999 = 0.002 ms
tristopia
Yes it is wrong, but it does not explains such big differences. It leads to 1 millisecond difference not 10ms or more. See my answer.
neuro
The error is up to 1s with the wrong formula, but you're right that that was not the problem of the OP.
tristopia
A: 

I've already done such measures, and my conclusion is exactly the same. Happens equally on Windows and Linux.

A program building an histogram in 10^-n of seconds gives the following results.

0.1 0
0.01 0
0.001 0
0.0001 2
1e-05 24
1e-06 69726023
Total: 69726049
Duration: 6.47403 seconds.
Average: 0.0928495 microseconds.

But note that this is on a brand new system. I recall using this a year ago on a 2004 system, and having a couple of hits per second in the 0.01 range (above 10 ms).

Didier Trosset
Yes. I had this sort of problem too. The problem is that sleep guarantees that you will sleep at least a given amount of time, but as linux (and Windows) are not real time OSes, they can sleep more than that. See my answer ...
neuro
+2  A: 

My guess is that it's related to the OS. Try running the process at realtime priority (see the chrt program) and see if that helps.

On another note, you're calculating mtime incorrectly. Here' a routine that I use, although it's for struct timespec rather than struct timeval (nanoseconds instead of microseconds) the principle should be clear:

timespec diff(timespec start, timespec end)
{
    timespec temp;
    if ((end.tv_nsec - start.tv_nsec) < 0) {
        temp.tv_sec = end.tv_sec - start.tv_sec - 1;
        temp.tv_nsec = 1000000000 + end.tv_nsec - start.tv_nsec;
    } else {
        temp.tv_sec = end.tv_sec - start.tv_sec;
        temp.tv_nsec = end.tv_nsec - start.tv_nsec;
    }
    return temp;
}
janneb
The calculation change didn't make any difference for me, but `chrt` sure did. Same Linux kernel as the OP.
Fred Larson
+4  A: 

The sleep functions only make sure that you sleep AT LEAST a certain amount of time. As Linux is not a real time OS, you CAN NOT be sure that it will sleep ONLY the amount of time you want. It is a problem as you CAN NOT count on that value. As you have pointed out, it happens that the sleep time is really BIG.

The Linux scheduler can not guarantee that. With a real time OS you can get that.

Your formula is wrong in a way, but I think that can not be the reason why you have such big sleep time. I check the two formulae with this snippet, and I have the same result :

#include <unistd.h>
#include <sys/time.h>
#include <time.h>
#include <stdio.h>

int main()
{
  struct timeval start, end;
  long mtime, mtime2, start_time, end_time, seconds, useconds;

  while(1)
  {
    gettimeofday(&start, NULL);

    usleep(2000);

    gettimeofday(&end, NULL);

    seconds  = end.tv_sec  - start.tv_sec;
    useconds = end.tv_usec - start.tv_usec;

    mtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;

    start_time = ((start.tv_sec) * 1000 + start.tv_usec/1000.0) + 0.5;
    end_time = ((end.tv_sec) * 1000 + end.tv_usec/1000.0) + 0.5;

    mtime2 = end_time - start_time;

    if(mtime > 10 || mtime2 > 10)
    {
      printf("WTF: %ld\n", mtime);
      printf("WTF2: %ld\n", mtime2);
    }
  }

  return 0;
}

The results :

$ gcc test.c -o out -lrt && ./out
WTF: 11
WTF2: 12
WTF: 21
WTF2: 21

I thought it wrong as the useconds part is cyclic and could lead to a big negative difference. But it will not lead to such big time as you use signed long integers ...

my2cents

Edit : from man nanosleep :

The current implementation of nanosleep() is based on the normal kernel timer mechanism, which has a resolution of 1/HZ s (see time(7)). Therefore, nanosleep() pauses always for at least the specified time, however it can take up to 10 ms longer than specified until the process becomes runnable again. For the same reason, the value returned in case of a delivered signal in *rem is usually rounded to the next larger multiple of 1/HZ s.

neuro
Thanks for explanations regarding Linux default scheduler not being realtime. It's quite surprising to discover such an issue. I wonder what soft realtime system developers(e.g in gamedev) do in such situations since delta time has big impact in such systems...
pachanga
@pachanga: You are welcome. I was surprised too at the time I discovered that. It is an issue. You can rely on the time reading precision which is good using nanosleep and try to implement in a way you can absorb that, e.g. trying to use only ONE sleep in your whole synchro loop to minimize errors. You should also sleep a bit less. I can imagine more complex mechanismes but nothing you can truly rely on but not using sleep :)
neuro
Yep, "not using sleep" that was I also thinking about as well... however in this case my application is going to totally consume all CPU clocks :(
pachanga
If you need a truly regular timer, use `timerfd()` in recurring mode. Although your wakeups may be delayed a bit from time to time, the errors shouldn't pile up.
bdonlan
Also note that that man page is a bit out of date - modern linux uses timers more accurate than the 1/HZ timer, if available (and all modern PCs have these more accurate timers).
bdonlan
+1  A: 

Found it, see the man page

http://linux.die.net/man/3/usleep

by the granularity of system timers.

which is 10 ms afaik. So the usleep can expire long before the process gets rescheduled.

It's also consistent with the values you get which are in the order of magnitude of a "normal" time slice.

tristopia