views:

47

answers:

1

Hi,

I've got a K-nearest neighbour problem where some of the dimensions are closed loops. For example one is 'time of day' and I'm matching for similarity so 'very early morning' is close to 'late evening', you can't just make it a linear scale from 'very early morning' at one end to 'late evening' at the other.

How can I represent this in the data model? Is there an established way to handle this or a way to work around it?

+2  A: 

I don't know about an established way to handle this, but 2 ideas suggest themselves;

  • Measure distance using modular arithmetic of some kind.
  • Map times of day to angles on the clock and measure distance as the smaller of the two angles between the times. (This also needs modular arithmetic so might really just be a complicated way of implementing the first suggestion.)

All this presumes that, as you indicate, you only have times of day and not times since some start point, ie 12:25 rather than 12:35 on 5th May 2009.

High Performance Mark
Thanks, modular arithmetic was what I was looking for.
Tomas