views:

102

answers:

3

Each product a product_date_added which is a Date field contained the date it was added. They also have a product_views which is an int field containing how many times a product has been viewed.

To display products by popularity, I us an algorithm to calculate how many hits per day a product has.

SELECT 
    AVG(product_views / DATEDIFF(NOW(), product_date_added)) as avg_hits
    , product_table.* 
FROM product_table
WHERE product_available = "yes" 
GROUP BY product_id
ORDER BY avg_hits DESC

This works, but the boss is noticing a lot of older products showing up first. So he basically wants newer views to have more weight than older views.

His suggestion was that any views over a year old don't count. I think I would have to keep a date of every view in order to do that, which I think would slow down performance.

What is the best way to create a popularity algorithm like what my boss is asking for?

Ideally I would want to be able to come up with something that doesn't alter the table structure. If that is not possible, I would at least like to come up with a solution that can use the existing data so we are not starting from 0. If thats not possible either than anything that will work.

+2  A: 

You wouldn't (as such) have to keep the date of every view. You could instead keep up to 366 rows per item in a table with columns: product_id, day_of_year, count. Each day run a task to zero all the counts from one year ago. If you don't mind denormalised data, that task could also update the "count" field in the item itself for fast retrieval, so that your query does not need to change. product_views just becomes product_views_in_the_last_year. The time period of 1 day is arbitrary - I doubt you care that the popularity is based on a window of precisely 1 year, so I expect it could just as well be an hour, a week or a fortnight, depending how many buckets you're willing to deal with.

An alternative scheme might be to use exponential decay. Turn the count field into a decimal type. Once per day, reduce the count of each item by a fixed percentage (less than 1%, more than 0.1%), so that the more recent a hit is, the more "weight" it has. So old popularity never dies entirely, but hits from a year ago won't contribute much. An equivalent to this scheme, by the way, is to leave the code as it is but ensure that your website as a whole becomes exponentially more popular over time ;-)

As for avoiding starting from zero - maybe reduce each item's count immediately, as a one-off action, by a proportion which depends on the age of the item. On the whole you'd expect that older objects have older views, and hence are over-rated by the current scheme. That's not fool-proof, since maybe some older items have recently gained a lot of hits. You might be able to identify these items by looking at recent web server logs, or by spending a week or a month counting hits before doing the one-off reduction. Even without doing that, if there's a fundamental reason for their popularity (not just because they're currently rating high in your rankings and hence receiving traffic from people looking at your "most popular" chart), then hopefully they'll recover given time.

Steve Jessop
I really like the reduce by small percentage everyday idea. With that there wont be a sharp cut off at 1 year ago but a gradual decay everyday. On your 3rd paragraph, maybe I could start by reducing by 10% times years active. So items added within a year will loose nothing and items from 3 years ago would loose 30%. Thanks.
John Isaacks
@John: yes, I'm vague about the numbers deliberately, because you'll probably want to try different values and look at the results until you have a chart that's roughly what you want. Or if you don't know what you want, at least a chart that no longer definitely looks wrong to you or your boss :-)
Steve Jessop
I thought of reducing by a fraction of the average views/day across *all* products (but not below 0).
Hugh Brackett
+1  A: 

this http://stackoverflow.com/questions/3705537/sorting-a-list-of-songs-by-popularity seems to be a similar problem

bronzebeard
That other question seems to be mostly about how to take account of downvotes.
Steve Jessop
true, but IMHO the solution had some beautiful maths and was general enough
bronzebeard
+1  A: 

You might want to check out this blog post. It's aimed at App Engine, but the technique is general. The basic approach is to have a popularity that decays exponentially, and is incremented any time you record a vote/download/whatever.

Nick Johnson