tags:

views:

15

answers:

1

I have a data table that is periodically updated by another service. I log when these updates occur in the database. I'd like to be able to somehow rate the most recently updated data records.

I don't want to select exactly the most recent, I'd like to sort of account for previous updates as well. I'll try and explain what I mean using an example. Suppose I have 3 data items

Item |  Updates (Days Ago)
One  |  30, 25, 19, 1
Two  |  5, 3, 2
Three|  30, 25, 20, 15, 10, 5

So judging by the above list, I'd like to have Three first because it appears to be the most consistent and has the most updates. Next I would like two because it has been active as of late. Then finally one because despite it having the most recent update, it had few updates in the iterim time window.

I haven't outlined exactly how my algorithm will work, as I don't know yet, but I've hopefully explained what I'm generally hoping to achieve.

I'm not explicitly looking for an implementation, I'm looking to be pointed in a helpful direction. Are there specific algorithms that address this problem, or a similar problem?

A: 

You could create some form of weighted equation. Say you want the following criteria:

  • Number of updates (numUpdates)

  • 1 / days since last post (lastPost)

  • Average updates per day since creation (avgUpdates)

Then, you could give each criteria a specific weight. E.G,

rating = (weight1 * numUpdates) + (weight2 * lastPost) + (weight3 * avgUpdates)

Sam Dufel
For the consistency portion, you could calculate the standard deviation of each update value from the overall average and then calculate the average of that. What I'm trying to describe is called the average absolute deviation, see this: http://en.wikipedia.org/wiki/Average_absolute_deviation
RibaldEddie
Got me thinking in the right direction. I'll probably go with a weighted moving average.
Mark