views:

72

answers:

2

I have a web page where users upload&watch videos. Last week I asked what is the best way to track video views so that I could display the most viewed videos this week (videos from all dates).

Now I need some help optimizing a query with which I get the videos from the database. The relevant tables are this:

video (~239371 rows)
VID(int), UID(int), title(varchar), status(enum), type(varchar), is_duplicate(enum), is_adult(enum), channel_id(tinyint)

signup (~115440 rows)
UID(int), username(varchar)

videos_views (~359202 rows after 6 days of collecting data, so this table will grow rapidly)
videos_id(int), views_date(date), num_of_views(int)

The table video holds the videos, signup hodls users and videos_views holds data about video views (each video can have one row per day in that table).

I have this query that does the trick, but takes ~10s to execute, and I imagine this will only get worse over time as the videos_views table grows in size.

SELECT
 v.VID, 
 v.title, 
 v.vkey, 
 v.duration, 
 v.addtime, 
 v.UID, 
 v.viewnumber, 
 v.com_num, 
 v.rate, 
 v.THB, 
 s.username,
 SUM(vvt.num_of_views) AS tmp_num
FROM
 video v
  LEFT JOIN videos_views vvt ON v.VID = vvt.videos_id
  LEFT JOIN signup s on v.UID = s.UID
WHERE
 v.status = 'Converted'
 AND v.type = 'public'
 AND v.is_duplicate = '0'
 AND v.is_adult = '0'
 AND v.channel_id <> 10
 AND vvt.views_date >= '2001-05-11'
GROUP BY
 vvt.videos_id
ORDER BY
 tmp_num DESC
LIMIT
 8

All the relevant fields are indexed. And here is a screenshot of the EXPLAIN result: alt text

So, how can I optimize this?

UPDATE This is my query based on the answer by Quassnoi. It returns the correct videos, but it messes up the JOIN on the signup table. For some records the username field is NULL, for others it contains the wrong username.

SELECT
    v.VID,
    v.title,
    v.vkey,
    v.duration,
    v.addtime,
    v.UID,
    v.viewnumber,
    v.com_num,
    v.rate,
    v.THB,
    s.username
FROM
    (SELECT
        videos_id,
        SUM(num_of_views) AS tmp_num
    FROM
        videos_views
    WHERE
        views_date >= '2010-05-13'
    GROUP BY
        videos_id
    ) q
        JOIN video v ON v.VID = q.videos_id
        LEFT JOIN signup s ON s.UID = v.VID
WHERE
    v.type = 'public'
    AND v.channel_id <> 10
    AND v.is_adult = '0'
    AND is_duplicate = '0'
ORDER BY
    tmp_num DESC
LIMIT
    8

Here is the resultset: alt text

+1  A: 

Create the following index:

video_views (views_date, videos_id)

, and get rid of the LEFT JOIN between videos and views (it does not work with your current query, anyway):

SELECT  *
FROM    (
        SELECT  videos_id, SUM(num_of_views) AS tmp_num
        FROM    video_views
        GROUP BY
                videos_id
        ) q
JOIN    videos v
ON      v.vid = q.videos_id
LEFT JOIN
        signup s
ON      s.UID = v.UID
ORDER BY
        tmp_num DESC
LIMIT 8

If you want to return zero for videos that had never been viewed, change the order of fields in the index:

video_views (videos_id, views_date)

and rewrite the query:

SELECT  *,
        (
        SELECT  COALESCE(SUM(num_of_views), 0)
        FROM    video_views vw
        WHERE   vw.videos_id = v.vid
                AND views_date >= '2001-05-11'
        ) AS tmp_num
FROM    videos v
LEFT JOIN
        signup s
ON      s.UID = v.UID
ORDER BY
        tmp_num DESC
LIMIT 8
Quassnoi
sorry, but this two queries don't produce the results I would want. The first one only returns the first video that is in the "video" table. The second one returns 8 videos, each having tmp_num = 0.
Jan Hančič
@janh: corrected, sorry.
Quassnoi
I'm heading home now, but I did a quick check and it seems this produces the same results as my query, only much much faster. Will look into it tomorrow. Thanks for now!
Jan Hančič
Your first query work's best for me. But it joins the wrong users. It returns null for some records (if I select the signup.username), and a wrong username for some. Any idea what might be wrong? See my update to see what my query looks like now.
Jan Hančič
@jahn: apparently I confused the fields to join on (`VID` and `UID`). Please try now.
Quassnoi
*sigh* I feel stupid, don't know how I didn't saw that ... thanks man, it works now!
Jan Hančič
+2  A: 

Yeah, ORDER BY on a computed column is always going to be unindexable. Sorry.

If you're going to be doing this query a lot and you want to avoid the views for each video having to be counted and ordered each time, you'll have to denormalise. Add a views_in_last_week column, recalculate it from videos_views in the background each day, and index it (possibly in a compound index with other relevant WHERE conditions).

bobince
+1 - there simply is no way to make that efficient. THe trick is to get rid of the real time aspect by recomputing them regularly. Like weekly, daily, even hourly. Group by on a calculated column => Pewrformance killer per definition.
TomTom