views:

1259

answers:

7

I've been building an error logging app recently and was after a way of accurately timestamping the incoming data. When I say accurately I mean each timestamp should be accurate relative to each other (no need to sync to an atomic clock or anything like that).

I've been using datetime.now() as a first stab, but this isn't perfect:

>>> for i in range(0,1000):
...     datetime.datetime.now()
...
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 609000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 609000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 609000)
etc.

The changes between clocks for the first second of samples looks like this:

uSecs    difference
562000  
578000  16000
609000  31000
625000  16000
640000  15000
656000  16000
687000  31000
703000  16000
718000  15000
750000  32000
765000  15000
781000  16000
796000  15000
828000  32000
843000  15000
859000  16000
890000  31000
906000  16000
921000  15000
937000  16000
968000  31000
984000  16000

So it looks like the timer data is only updated every ~15-32ms on my machine. The problem comes when we come to analyse the data because sorting by something other than the timestamp and then sorting by timestamp again can leave the data in the wrong order (chronologically). It would be nice to have the time stamps accurate to the point that any call to the time stamp generator gives a unique timestamp.

I had been considering some methods involving using a time.clock() call added to a starting datetime, but would appreciate a solution that would work accurately across threads on the same machine. Any suggestions would be very gratefully received.

+7  A: 

time.clock() only measures wallclock time on Windows. On other systems, time.clock() actually measures CPU-time. On those systems time.time() is more suitable for wallclock time, and it has as high a resolution as Python can manage -- which is as high as the OS can manage; usually using gettimeofday(3) (microsecond resolution) or ftime(3) (millisecond resolution.) Other OS restrictions actually make the real resolution a lot higher than that. datetime.datetime.now() uses time.time(), so time.time() directly won't be better.

For the record, if I use datetime.datetime.now() in a loop, I see about a 1/10000 second resolution. From looking at your data, you have much, much coarser resolution than that. I'm not sure if there's anything Python as such can do, although you may be able to convince the OS to do better through other means.

I seem to recall that on Windows, time.clock() is actually (slightly) more accurate than time.time(), but it measures wallclock since the first call to time.clock(), so you have to remember to 'initialize' it first.

Thomas Wouters
Indeed, here is what it looks on Debian/Linux: datetime.datetime(2008, 10, 1, 17, 11, 31, 875190) datetime.datetime(2008, 10, 1, 17, 11, 31, 875199) datetime.datetime(2008, 10, 1, 17, 11, 31, 875207)
bortzmeyer
I can confirm that clock is indeed more accurate on all the windows machines I've tried it on.
Jon Cage
+2  A: 

Here is a thread about Python timing accuracy:

http://stackoverflow.com/questions/85451/python-timeclock-vs-timetime-accuracy

Corey Goldberg
Yeah, I'd already seen that one, but those are relative to a process starting or the call to clock rather than an absolute(ish) time.
Jon Cage
+4  A: 

You're unlikely to get sufficiently fine-grained control that you can completely eliminate the possibility of duplicate timestamps - you'd need resolution smaller than the time it takes to generate a datetime object. There are a couple of other approaches you might take to deal with it:

  1. Deal with it. Leave your timestamps non-unique as they are, but rely on python's sort being stable to deal with reordering problems. Sorting on timestamp first, then something else will retain the timestamp ordering - you just have to be careful to always start from the timestamp ordered list every time, rather than doing multiple sorts on the same list.

  2. Append your own value to enforce uniqueness. Eg. include an incrementing integer value as part of the key, or append such a value only if timestamps are different. Eg.

The following will guarantee unique timestamp values:

    class TimeStamper(object):
        def __init__(self):
            self.lock = threading.Lock()
            self.prev = None
            self.count = 0

         def getTimestamp(self):
             with self.lock:
                 ts = str(datetime.now())
                 if ts == self.prev:
                     ts +='.%04d' % self.count
                     self.count += 1
                 else:
                     self.prev = ts
                     self.count = 1
             return ts

For multiple processes (rather than threads), it gets a bit trickier though.

Brian
I realize this is a bit nitpicky, but you mean "strictly increasing integer" not "monotonically increasing integer". A monotonically increasing set means that it doesn't ever decrease, but could still have equal values.
Tony Arkles
All nitpicks gratefully accepted. You're absolutely right - I've fixed the sloppy wording.
Brian
+2  A: 

"timestamp should be accurate relative to each other "

Why time? Why not a sequence number? If it's any client of client-server application, network latency makes timestamps kind of random.

Are you matching some external source of information? Say a log on another application? Again, if there's a network, those times won't be too close.

If you must match things between separate apps, consider passing GUID's around so that both apps log the GUID value. Then you could be absolutely sure they match, irrespective of timing differences.

If you want the relative order to be exactly right, maybe it's enough for your logger to assign a sequence number to each message in the order they were received.

S.Lott
I needed time stamps because I need to know when the data is collected and to see when there are gaps in the data being produced.
Jon Cage
If you solution depends on clock accuracy, you'll have to find an OS that guarantees that your process is always the first thing that happens when the collected data arrives. Otherwise OS scheduling will bollix this up.
S.Lott
+2  A: 

Thank you all for your contributions - they've all be very useful. Brian's answer seems closest to what I eventually went with (i.e. deal with it but use a sort of unique identifier - see below) so I've accepted his answer. I managed to consolidate all the various data receivers into a single thread which is where the timestamping is now done using my new AccurrateTimeStamp class. What I've done works as long as the time stamp is the first thing to use the clock.

As S.Lott stipulates, without a realtime OS, they're never going to be absolutely perfect. I really only wanted something that would let me see relative to each incoming chunk of data, when things were being received so what I've got below will work well.

Thanks again everyone!

import time

class AccurateTimeStamp():
    """
    A simple class to provide a very accurate means of time stamping some data
    """

    # Do the class-wide initial time stamp to synchronise calls to 
    # time.clock() to a single time stamp
    initialTimeStamp = time.time()+ time.clock()

    def __init__(self):
        """
        Constructor for the AccurateTimeStamp class.
        This makes a stamp based on the current time which should be more 
        accurate than anything you can get out of time.time().
        NOTE: This time stamp will only work if nothing has called clock() in
        this instance of the Python interpreter.
        """
        # Get the time since the first of call to time.clock()
        offset = time.clock()

        # Get the current (accurate) time
        currentTime = AccurateTimeStamp.initialTimeStamp+offset

        # Split the time into whole seconds and the portion after the fraction 
        self.accurateSeconds = int(currentTime)
        self.accuratePastSecond = currentTime - self.accurateSeconds


def GetAccurateTimeStampString(timestamp):
    """
    Function to produce a timestamp of the form "13:48:01.87123" representing 
    the time stamp 'timestamp'
    """
    # Get a struct_time representing the number of whole seconds since the 
    # epoch that we can use to format the time stamp
    wholeSecondsInTimeStamp = time.localtime(timestamp.accurateSeconds)

    # Convert the whole seconds and whatever fraction of a second comes after
    # into a couple of strings 
    wholeSecondsString = time.strftime("%H:%M:%S", wholeSecondsInTimeStamp)
    fractionAfterSecondString = str(int(timestamp.accuratePastSecond*1000000))

    # Return our shiny new accurate time stamp   
    return wholeSecondsString+"."+fractionAfterSecondString


if __name__ == '__main__':
    for i in range(0,500):
        timestamp = AccurateTimeStamp()
        print GetAccurateTimeStampString(timestamp)
Jon Cage
A: 

I wanted to thank J. Cage for this last post.

For my work, "reasonable" timing of events across processes and platforms is essential. There are obviously lots of places where things can go askew (clock drift, context switching, etc.), however this accurate timing solution will, I think, help to ensure that the time stamps recorded are sufficiently accurate to see the other sources of error.

That said, there are a couple of details I wonder about that are explained in When MicroSeconds Matter. For example, I think time.clock() will eventually wrap. I think for this to work for a long running process, you might have to handle that.

You're welcome :-)
Jon Cage
A: 

Thanks for you code! looking to build into a timer..............

Thanks.. if you found it useful then vote it up :)
Jon Cage