tags:

views:

418

answers:

3

Hi, I am using the timed_wait from boost C++ library and I am getting a problem with leap seconds.

Here is a quick test:

#include <boost/thread.hpp>
#include <stdio.h>
#include <boost/date_time/posix_time/posix_time.hpp>

int main(){
        // Determine the absolute time for this timer.
        boost::system_time tAbsoluteTime = boost::get_system_time() + boost::posix_time::milliseconds(35000);

        bool done;
        boost::mutex m;
        boost::condition_variable cond;

        boost::unique_lock<boost::mutex> lk(m);
        while(!done)
        {
            if(!cond.timed_wait(lk,tAbsoluteTime))
            {
                done = true;
                std::cout << "timed out";
            }
        }
        return 1;
}

The timed_wait function is returning 24 seconds earlier than it should. 24 seconds is the current amount of leap seconds in UTC.

So, boost is widely used but I could not find any info about this particular problem. Has anyone else experienced this problem? What are the possible causes and solutions?

Notes: I am using boost 1.38 on a linux system. I've heard that this problem doesn't happen on MacOS.

UPDATE: A little more info: This is happening on 2 redhat machines with kernel 2.6.9. I have executed the same code on an ubuntu machine with kernel 2.6.30 and the timer behaves as expected.

So, what I think is that this is probably being caused by the OS or by some mis-set configuration on the redhat machines.

I have coded a workaround that adjusts the time to UTC and than get the difference from this adjustment and add to the original time. This seens like a bad idea to me because if this code is executed on a machine without this problem, it might be 24s AHEAD. Still could not find the reason for this.

+1  A: 

On a Linux system, the system clock will follow the POSIX standard, which mandates that leap seconds are NOT observed! If you expected otherwise, that's probably the source of the discrepancy you're seeing. This document has a great explanation of how UTC relates to other time scales, and the problems one is likely to encounter if one relies on the operating system's concept of timekeeping.

Jim Lewis
So, the solution is to add the leap seconds if the OS is linux? Isn't there an OS agnostic solution? Today my target OS is linux but it might change someday...
Isac
What I don't understand is why there is this 24s difference if the get_system_time calls are on the same machine? Does timed_wait use some other function to get the time? I've read the boost documentation and they say that since 1.36 they use the time_and_date classes.
Isac
@Isac: There are no operating systems that I'm aware of where the system clock respects UTC leap seconds. Yet you describe a 24s offset, which equals the number of leap seconds introduced since the Unix epoch. So I have to echo Mark Ransom's comments: you need to tell us how you're measuring this 24s difference, because that's not apparent from the code you've posted.
Jim Lewis
I did the measurement like this:First method: checking the logs and seeing the timer was firing 24s earlier.Second method: setting the timer to 24s or lower and seeing it fire instantaneously.Am I doing this the wrong way?
Isac
@Jim Lewis: Actually, there have been 25 leap seconds since 1970. Don't forget the one in Jan 2009.
derobert
So when the timer expires and you print out both the requested abs_time timeout and then current time, you're saying they're off by 24 seconds?
Mark B
Well, that would happen inside timed_wait (that call other functions to do that), so I don't know who is adding or removing these seconds. My guess is that this kind of check is left to the OS and the OS is using a different time from get_system_time.
Isac
@Isac: I don't think we're going to make much headway until you post the actual code...something self-contained and complete enough that we can reproduce the problem you're having. Otherwise we're just playing guessing games. The devil is in the details!
Jim Lewis
The problem happens with this exact code on a redhat machine with 2.6.9 kernel. I have executed the code on a ubuntu machine with 2.6.30 kernel and the problem didn't happen. So, the code is this! The problem might not be the code (it can be a kernel level problem or a configuration problem).
Isac
@Isac: What you posted is not complete enough to even compile, so it's not really "this exact code". But your and Emile Cormier's observations that it works correctly on a different machine are important...you might want to update your question with the additional information. Maybe it's not a UTC versus POSIX time issue after all, in which case I'll be deleting my answer.
Jim Lewis
I just did that. I will put my test code as well.
Isac
+1  A: 

Is it possible that done is getting set prematurely and a spurious wakeup is causing the loop to exit sooner than you expected?

Mark B
That sounds like a plausible scenario, given the information we have so far.
Jim Lewis
That would not cause the timer to fire 24s earlier consistently.
Isac
A: 

Ok, here is what I did. It's a workaround and I am not happy with it but it was the best I could come up with:

int main(){
        typedef boost::date_time::c_local_adjustor<boost::system_time> local_adj;

        // Determine the absolute time for this timer.
        boost::system_time tAbsoluteTime = boost::get_system_time() + boost::posix_time::milliseconds(25000);

        /*
         * A leap second is a positive or negative one-second adjustment to the Coordinated
         * Universal Time (UTC) time scale that keeps it close to mean solar time.
         * UTC, which is used as the basis for official time-of-day radio broadcasts for civil time,
         * is maintained using extremely precise atomic clocks. To keep the UTC time scale close to
         * mean solar time, UTC is occasionally corrected by an adjustment, or "leap",
         * of one second.
         */
        boost::system_time tAbsoluteTimeUtc = local_adj::utc_to_local(tAbsoluteTime);

        // Calculate the local-to-utc difference.
        boost::posix_time::time_duration tLocalUtcDiff = tAbsoluteTime - tAbsoluteTimeUtc;

        // Get only the seconds from the difference. These are the leap seconds.
        tAbsoluteTime += boost::posix_time::seconds(tLocalUtcDiff.seconds());

        bool done;
        boost::mutex m;
        boost::condition_variable cond;

        boost::unique_lock<boost::mutex> lk(m);
        while(!done)
        {
            if(!cond.timed_wait(lk,tAbsoluteTime))
            {
                done = true;
                std::cout << "timed out";
            }
        }
        return 1;
}

I've tested it on problematic and non-problematic machines and it worked as expected on both, so I'm keeping it as long as I can't found a better solution.

Thank you all for your help.

Isac