views:

749

answers:

5

Hello,

I'm looking for a "nice numbers" algorithm for determining the labels on a date/time value axis. I'm familar with Paul Heckbert's Nice Numbers algorithm (http://tinyurl.com/5gmk2c).

I have a plot that displays time/date on the X axis and the user can zoom in and look at a smaller time frame. I'm looking for an algorithm that picks nice dates to display on the ticks.

For example:

Looking at a day or so: 1/1 12:00, 1/1 4:00, 1/1 8:00... Looking at a week: 1/1, 1/2, 1/3... Looking at a month: 1/09, 2/09, 3/09...

The nice label ticks don't need to correspond to the first visible point, but close to it.

Is anybody familar with such an algorithm?

Thanks

+1  A: 

Still no answer to this question... I'll throw my first idea in then! I assume you have the range of the visible axis.

This is probably how I would do.

Rough pseudo:

// quantify range
rangeLength = endOfVisiblePart - startOfVisiblePart;

// qualify range resolution
if (range < "1.5 day") {
    resolution = "day";  // it can be a number, e.g.: ..., 3 for day, 4 for week, ...
} else if (range < "9 days") {
    resolution = "week";
} else if (range < "35 days") {
    resolution = "month";
} // you can expand this in both ways to get from nanoseconds to geological eras if you wish

After that, it should (depending on what you have easy access to) be quite easy to determine the value to each nice label tick. Depending on the 'resolution', you format it differently. E.g.: MM/DD for "week", MM:SS for "minute", etc., just like you said.

M. Joanis
Things like "1.5 day", "9 days", etc. are highly language dependent in term of implementation (to me). For example, in C or even C++, I would just use an unsigned long to hold the difference in milliseconds between both times, whereas in Java, I would probably create a Time or Moment class, and probably there are already some of those somewhere...
M. Joanis
+2  A: 

The 'nice numbers' article you linked to mentioned that

the nicest numbers in decimal are 1, 2, 5 and all power-of-10 multiples of these numbers

So I think for doing something similar with date/time you need to start by similarly breaking down the component pieces. So take the nice factors of each type of interval:

  • If you're showing seconds or minutes use 1, 2, 3, 5, 10, 15, 30 (I skipped 6, 12, 15, 20 because they don't "feel" right).
  • If you're showing hours use 1, 2, 3, 4, 6, 8, 12
  • for days use 1, 2, 7
  • for weeks use 1, 2, 4 (13 and 26 fit the model but seem too odd to me)
  • for months use 1, 2, 3, 4, 6
  • for years use 1, 2, 5 and power-of-10 multiples

Now obviously this starts to break down as you get into larger amounts. Certainly you don't want to do show 5 weeks worth of minutes, even in "pretty" intervals of 30 minutes or something. On the other hand, when you only have 48 hours worth, you don't want to show 1 day intervals. The trick as you have already pointed out is finding decent transition points.

Just on a hunch, I would say a reasonable crossover point would be about twice as much as the next interval. That would give you the following (min and max number of intervals shown afterwards)

  • use seconds if you have less than 2 minutes worth (1-120)
  • use minutes if you have less than 2 hours worth (2-120)
  • use hours if you have less than 2 days worth (2-48)
  • use days if you have less than 2 weeks worth (2-14)
  • use weeks if you have less than 2 months worth (2-8/9)
  • use months if you have less than 2 years worth (2-24)
  • otherwise use years (although you could continue with decades, centuries, etc if your ranges can be that long)

Unfortunately, our inconsistent time intervals mean that you end up with some cases that can have over 1 hundred intervals while others have at most 8 or 9. So you'll want to pick the size of your intervals such than you don't have more than 10-15 intervals at most (or less than 5 for that matter). Also, you could break from a strict definition of 2 times the next biggest interval if you think its easy to keep track of. For instance, you could use hours up to 3 days (72 hours) and weeks up to 4 months. A little trial and error might be necessary.

So to go back over, choose the interval type based on the size of your range, then choose the interval size by picking one of the "nice" numbers that will leave you with between 5 and about 15 tick marks. Or if you know and/or can control the actual number of pixels between tick marks you could put upper and lower bounds on how many pixels are acceptable between ticks (if they are spaced too far apart the graph may be hard to read, but if there are too many ticks the graph will be cluttered and your labels may overlap).

Rob Van Dam
A: 

I'd suggest you grab the source code to gnuplot or RRDTool (or even Flot) and examine how they approach this problem. The general case is likely to be N labels applied based on width of your plot, which some kind of 'snapping' to the nearest 'nice' number.

Every time I've written such an algorithm (too many times really), I've used a table of 'preferences'... ie: based on the time range on the plot, decide if I'm using Weeks, Days, Hours, Minutes etc as the main axis point. I usually included some preferred formatting, as I rarely want to see the date for each minute I plot on the graph.

I'd be happy but surprised to find someone using a formula (like Heckbert does) to find 'nice', as the variation in time units between minutes, hours, days, and weeks are not that linear.

ericslaw
A: 

Have a look at

http://tools.netsa.cert.org/netsa-python/doc/index.html

It has a nice.py ( python/netsa/data/nice.py ) which i think is stand-alone, and should work fine.

Avind
A: 

[Edit - I expanded this a little more at http://www.acooke.org/cute/AutoScalin0.html ]

A naive extension of the "nice numbers" algorithm seems to work for base 12 and 60, which gives good intervals for hours and minutes. This is code I just hacked together:

LIM10 = (10, [(1.5, 1), (3, 2), (7, 5)], [1, 2, 5])
LIM12 = (12, [(1.5, 1), (3, 2), (8, 6)], [1, 2, 6])
LIM60 = (60, [(1.5, 1), (20, 15), (40, 30)], [1, 15, 40])


def heckbert_d(lo, hi, ntick=5, limits=None):
    '''
    Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
    '''
    if limits is None:
        limits = LIM10
    (base, rfs, fs) = limits
    def nicenum(x, round):
        step = base ** floor(log(x)/log(base))
        f = float(x) / step
        nf = base
        if round:
            for (a, b) in rfs:
                if f < a:
                    nf = b
                    break
        else:
            for a in fs:
                if f <= a:
                    nf = a
                    break
        return nf * step
    delta = nicenum(hi-lo, False)
    return nicenum(delta / (ntick-1), True)


def heckbert(lo, hi, ntick=5, limits=None):
    '''
    Heckbert's "nice numbers" algorithm for graph ranges, from "Graphics Gems".
    '''
    def _heckbert():
        d = heckbert_d(lo, hi, ntick=ntick, limits=limits)
        graphlo = floor(lo / d) * d
        graphhi = ceil(hi / d) * d
        fmt = '%' + '.%df' %  max(-floor(log10(d)), 0)
        value = graphlo
        while value < graphhi + 0.5*d:
            yield fmt % value
            value += d
    return list(_heckbert())

So, for example, if you want to display seconds from 0 to 60,

>>> heckbert(0, 60, limits=LIM60)
['0', '15', '30', '45', '60']

or hours from 0 to 5:

>>> heckbert(0, 5, limits=LIM12)
['0', '2', '4', '6']
andrew cooke