views:

72

answers:

2
SELECT
    b.categoryid,
    SUM(viewcount) AS cnt,
    categoryname
FROM
    bookvisit AS bv
    INNER JOIN book AS b ON b.isbn = bv.isbn
    LEFT JOIN category AS c ON b.categoryid = c.categoryid
WHERE
    b.categoryid IS NOT NULL AND
    b.categoryid <> 0 
GROUP BY
    b.categoryid 
ORDER BY
    cnt DESC,
    bv.isbn
LIMIT 0, 4

I have three tables - book (contains books information) bookvisit (book visit info) category (category master )

What I need is popular categories, above query is fine with two eq_ref but it has a Using temporary; Using filesort also

any help ?

+1  A: 

Based on your comment from yesterday, I think the problem is you are doing an ORDER BY on a calculated (aggregate) column, so it can't use an index.

The only way around that would be to add a category.viewcount column that you update when you update book.viewcount. It wouldn't take up much extra space. The downsides of this solution are:

  1. When a book's category changes (which I assume is rare), you'll need to subtract book.viewcount from the old category and add it to the new category
  2. Lock contention on the category tables might be a performance problem

If this were me and performance was essential, I would write a separate service that would read in the current stats into memory on startup, and have the system send updates to it and query it to get the latest stats. That saves locking and writing the category table.

Of course if the numbers don't have to be 100% up to date you can always run the query overnight and cache the results for the day.

dj_segfault
A: 

As @dj_segfault said, you cannot have an index on an aggregate column in MySQL and you will have to write a service that would cache the SUMs in a shapshot table (which you can index).

Here's how you can do it and still have the accurate statistics:

  1. Create a snapshot table:

    category  cnt
    

    with a PRIMARY KEY on category.

  2. Create a single-field, single-record table called snapshot_time:

    taken
    
  3. On a timely basis, fill this table with the query:

    UPDATE  snapshot_time
    SET     taken = NOW()
    
    
    INSERT  
    INTO    snapshot
    SELECT  b.category, COUNT(*) AS new_cnt,
            (
            SELECT  taken
            FROM    snapshot_time
            ) AS new_taken
    FROM    bookvisit bv
    JOIN    book b
    ON      b.isbn = bv.isbn
    WHERE   bv.visit_time <=
            (
            SELECT  taken
            FROM    snapshot_time
            )
    ON DUPLICATE KEY UPDATE
    SET     cnt = new_cnt,
            snapshot_taken = new_taken
    
  4. Create the following indexes:

    snapshot (cnt)
    bookvisit (visit_time)
    book (category)
    
  5. Run this query:

    SELECT  category,
            cnt +
            (
            SELECT  COUNT(*)
            FROM    bookvisit bv
            JOIN    book b
            ON      b.isbn = bv.isbn
            WHERE   bv.visit_time > 
                    (
                    SELECT  taken
                    FROM    shapshot_time
                    )
                    AND b.category = s.category
            ) AS total
    FROM    snapshot
    WHERE   cnt >=
            (
            SELECT  cnt
            FROM    snapshot
            ORDER BY
                    cnt DESC
            LIMIT 4
            )
            -
            (
            SELECT  COUNT(*)
            FROM    bookvisit
            WHERE   bv.visit_time > 
                    (
                    SELECT  taken
                    FROM    shapshot_time
                    )
            )
    ORDER BY
            total DESC
    LIMIT 4
    

The query will return you accurate visit count.

The main idea is that you need to scan only the records in bookvisit that were collected after the statistics were cached.

More than that: you don't even have to scan all records in the cached statistics. Since the number of visits only grows, you can only scan the results that can possibly get into the first four.

If the 4th record has 1,000,000 page views in the snapshot, and 1,000 page views happened after you took the snapshot, you can only select the records from the snapshot with cnt >= 999,000. The other records could not theoretically hit this limit, since it would take more than 1K page views.

The only problem is that you can delete the books or change their categories. In this case you would just need to recalculate the statistics or fall back to your original method.

Quassnoi
hi Quassnoi, Thanks for the reply. I read your site explainextended.com regularly and I have also sent the same query thing from there also. I will try for this solution.
gajendra.bang