views:

737

answers:

5

I have DateTime structure for an old data format that I don't have access to any specs for. There is a field which indicates the datetime of the the data, but it isn't in any format I recognize. It appears to be stored as a 32-bit integer, that increments by 20 for each day. Has anyone ever run across something like this?

EDIT:

Example: 1088631936 DEC = 80 34 E3 40 00 00 00 00 HEX = 09/07/2007

EDIT:

First off, sorry for the delay. I had hoped to do stuff over the weekend, but was unable to.

Second, this date format is weirder than I initially thought. It appears to be some sort of exponential or logarithmic method, as the dates do not change at an increasing rate.

Third, the defunct app that I have for interpreting these values only shows the date portion, so I don't know what the time portion is.

Example data: (Hex values are big-endian, dates are mm/dd/yyyy)

0x40000000 = 01/01/1900
0x40010000 = 01/01/1900
0x40020000 = 01/01/1900
0x40030000 = 01/01/1900
0x40040000 = 01/01/1900
0x40050000 = 01/01/1900
0x40060000 = 01/01/1900
0x40070000 = 01/01/1900
0x40080000 = 01/02/1900
0x40090000 = 01/02/1900
0x400A0000 = 01/02/1900
0x400B0000 = 01/02/1900
0x400C0000 = 01/02/1900
0x400D0000 = 01/02/1900
0x400E0000 = 01/02/1900
0x400F0000 = 01/02/1900
0x40100000 = 01/03/1900
0x40110000 = 01/03/1900
0x40120000 = 01/03/1900
0x40130000 = 01/03/1900
0x40140000 = 01/04/1900
0x40150000 = 01/04/1900
0x40160000 = 01/04/1900
0x40170000 = 01/04/1900
0x40180000 = 01/05/1900
0x40190000 = 01/05/1900
0x401A0000 = 01/05/1900
0x401B0000 = 01/05/1900
0x401C0000 = 01/06/1900
0x401D0000 = 01/06/1900
0x401E0000 = 01/06/1900
0x401F0000 = 01/06/1900
0x40200000 = 01/07/1900
0x40210000 = 01/07/1900
0x40220000 = 01/08/1900
0x40230000 = 01/08/1900
....
0x40800000 = 05/26/1901
0x40810000 = 06/27/1901
0x40820000 = 07/29/1901
....
0x40D00000 = 11/08/1944
0x40D10000 = 08/29/1947

EDIT: I finally figured this out, but since I've already given up the points for the bounty, I'll hold off on the solution in case anyone wants to give it a shot.

BTW, there is no time component to this, it is purely for storing dates.

+3  A: 

Are you sure that values correspond to 09/07/2007?

I ask because 1088631936 are the number of seconds since Linux (et al) zero date: 01/01/1970 00:00:00 to 06/30/2004 21:45:36.

Seems to me reasonable to think the value are seconds since this usual zero date.

Edit: I know it is very possible for this not to be the correct answer. It is just one approach (a valid one) but I think more info is needed (see the comments). Editing this (again) to bring the question to the front in the hope of somebody else to answer it or give ideas. Me: with a fairness, sportive and sharing spirit :D

vmarquez
+1  A: 

I'd say that vmarquez is close.

Here are dates 2009-3-21 and 2009-3-22 as unix epochtime:

In [8]: time.strftime("%s", (2009, 3, 21, 1, 1, 0, 0,0,0))
Out[8]: '1237590060'

In [9]: time.strftime("%s", (2009, 3, 22, 1, 1, 0, 0,0,0))
Out[9]: '1237676460'

And here they are in hex:

In [10]: print("%0x %0x" % (1237590060, 1237676460))
49c4202c 49c571ac

If you take only first 5 digits, the growth is 21. Which kinda matches your format, neg?

Pasi Savolainen
Thanks Pasi. The mathematics are simple, clean and very close to the description. Makes me wonder if "increment by 20 by day" is exact or is more like "like-20-by-day". Anyway, we are left with the epoch problem (what is zero date?).
vmarquez
+1  A: 

Some context would be useful. If your data file looks something, literally or at least figuratively, like this file, vmarquez is on the money.

http://www.slac.stanford.edu/comp/net/bandwidth-tests/eventanalysis/all_100days_sep04/node1.niit.pk

That reference is data produced by Available Bandwith Estimation tool (ABwE) -- the curious item is that it actually contains that 1088631936 value as well as the context. That example


 date     time       abw     xtr   dbcap   avabw   avxtr  avdbcap      rtt    timestamp
06/30/04 14:43:48  1.000   0.000   1.100   1.042   0.003    1.095  384.387   1088631828
06/30/04 14:45:36  1.100   0.000   1.100   1.051   0.003    1.096  376.408   1088631936
06/30/04 14:47:23  1.000   0.000   1.100   1.043   0.003    1.097  375.196   1088632043
seems to have a seven hour offset from the suggested 21:45:36 time value. (Probably Stanford local, running on Daylight savings time.)

Refactor
Now that's a great find! +1
vmarquez
A: 

Well, you've only shown us how your program uses 2 of the 8 digits, so we'll have to assume that the other 6 are ignored (because your program could be doing anything it wants with those other digits).

So, we could say that the input format is: 40mn0000 where m and n are two hex digits.

Then, the output is: 01/01/1900 + floor((2^(m+1)-2) + n*2^(m-3)) days

Explanation:

  1. In each example, notice that incrementing n by 1 increases the number of days by 2^(m-3).
  2. Notice that every time n goes from F to 0, m is incremented.

Using these two rules, and playing around with the numbers, you get the equation above. (Except for floor, which was added because the output doesn't display fractional days).

I suppose you could rewrite this by replacing the two separate hex variables m and n with a single 2-digit hex number H. However, I think that would make the equation a lot uglier.

mbeckish
Congratulations mbeckish. Confirmed your formula.
vmarquez
you missed the very first example I gave: 0x40E33480 = 09/07/2007
Kevin
+6  A: 

It's not integer, it's a 32 bit floating point number. I haven't quite worked out the format yet, it's not IEEE.

Edit: got it. 1 bit sign, 11 bit exponent with an offset of 0x3ff, and 20 bit mantissa with an implied bit to the left. In C, assuming positive numbers only:

double offset = pow(2, (i >> 20) - 0x3ff) * (((i & 0xfffff) + 0x100000) / (double) 0x100000);

This yields 0x40000000 = 2.0, so the starting date must be 12/30/1899.

Edit again: since you were so kind as to accept my answer, and you seem concerned about speed, I thought I'd refine this a little. You don't need the fractional part of the real number, so we can convert straight to integer using only bitwise operations. In Python this time, complete with test results. I've included some intermediate values for better readability. In addition to the restriction of no negative numbers, this version might have problems when the exponent goes over 19, but this should keep you good until the year 3335.

>>> def IntFromReal32(i):
        exponent = (i >> 20) - 0x3ff
        mantissa = (i & 0xfffff) + 0x100000
        return mantissa >> (20 - exponent)

>>> testdata = range(0x40000000,0x40240000,0x10000) + range(0x40800000,0x40830000,0x10000) + [1088631936]
>>> from datetime import date,timedelta
>>> for i in testdata:
        print "0x%08x" % i, date(1899,12,30) + timedelta(IntFromReal32(i))


0x40000000 1900-01-01
0x40010000 1900-01-01
0x40020000 1900-01-01
0x40030000 1900-01-01
0x40040000 1900-01-01
0x40050000 1900-01-01
0x40060000 1900-01-01
0x40070000 1900-01-01
0x40080000 1900-01-02
0x40090000 1900-01-02
0x400a0000 1900-01-02
0x400b0000 1900-01-02
0x400c0000 1900-01-02
0x400d0000 1900-01-02
0x400e0000 1900-01-02
0x400f0000 1900-01-02
0x40100000 1900-01-03
0x40110000 1900-01-03
0x40120000 1900-01-03
0x40130000 1900-01-03
0x40140000 1900-01-04
0x40150000 1900-01-04
0x40160000 1900-01-04
0x40170000 1900-01-04
0x40180000 1900-01-05
0x40190000 1900-01-05
0x401a0000 1900-01-05
0x401b0000 1900-01-05
0x401c0000 1900-01-06
0x401d0000 1900-01-06
0x401e0000 1900-01-06
0x401f0000 1900-01-06
0x40200000 1900-01-07
0x40210000 1900-01-07
0x40220000 1900-01-08
0x40230000 1900-01-08
0x40800000 1901-05-26
0x40810000 1901-06-27
0x40820000 1901-07-29
0x40e33480 2007-09-07
Mark Ransom
Wouldn't that give 0x40220000 -> 01/06/1900?
mbeckish
No, it's correct. 0x40220000=9.0, add to 12/30/1899 gives 12/39/1899; subtract 31 days and carry the month, leaves you 1/08/1900.
Mark Ransom
Well done. Your answer is not only right, it's considerably faster than the way I was doing it. Have a bounty.
Kevin