views:

46

answers:

1

I have two tables, news and news_views. Every time an article is viewed, the news id, IP address and date is recorded in news_views.

I'm using a query with a subquery to fetch the most viewed titles from news, by getting the total count of views in the last 24 hours for each one.

It works fine except that it takes between 5-10 seconds to run, presumably because there's hundreds of thousands of rows in news_views and it has to go through the entire table before it can finish. The query is as follows, is there any way at all it can be improved?

SELECT n.title
,      nv.views
FROM   news n
LEFT
JOIN   (
       SELECT news_id
       ,      count( DISTINCT ip ) AS views
       FROM   news_views
       WHERE  datetime >= SUBDATE(now(), INTERVAL 24 HOUR)
       GROUP
       BY     news_id
       ) AS nv
ON     nv.news_id = n.id
ORDER
BY     views DESC
LIMIT  15
+1  A: 

I don't think you need to calculate the count of views as a derived table:

SELECT n.id, n.title, count( DISTINCT nv.ip ) AS views
FROM   news n
LEFT JOIN news_views nv  
ON nv.news_id = n.id
WHERE nv.datetime >= SUBDATE(now(), INTERVAL 24 HOUR)
GROUP BY n.id, n.title
ORDER BY views DESC LIMIT  15

The best advice here is to run these queries through EXPLAIN (or whatever mysql's equivalent is) to see what the query will actually do - index scans, table scans, estimated costs, etc. Avoid full table scans.

matt b
Your query's dropped the run time down to under 0.5 seconds, thanks!
Dan