As @dj_segfault said, you cannot have an index on an aggregate column in MySQL
and you will have to write a service that would cache the SUMs in a shapshot table (which you can index).
Here's how you can do it and still have the accurate statistics:
Create a snapshot table:
category cnt
with a PRIMARY KEY
on category
.
Create a single-field, single-record table called snapshot_time
:
taken
On a timely basis, fill this table with the query:
UPDATE snapshot_time
SET taken = NOW()
INSERT
INTO snapshot
SELECT b.category, COUNT(*) AS new_cnt,
(
SELECT taken
FROM snapshot_time
) AS new_taken
FROM bookvisit bv
JOIN book b
ON b.isbn = bv.isbn
WHERE bv.visit_time <=
(
SELECT taken
FROM snapshot_time
)
ON DUPLICATE KEY UPDATE
SET cnt = new_cnt,
snapshot_taken = new_taken
Create the following indexes:
snapshot (cnt)
bookvisit (visit_time)
book (category)
Run this query:
SELECT category,
cnt +
(
SELECT COUNT(*)
FROM bookvisit bv
JOIN book b
ON b.isbn = bv.isbn
WHERE bv.visit_time >
(
SELECT taken
FROM shapshot_time
)
AND b.category = s.category
) AS total
FROM snapshot
WHERE cnt >=
(
SELECT cnt
FROM snapshot
ORDER BY
cnt DESC
LIMIT 4
)
-
(
SELECT COUNT(*)
FROM bookvisit
WHERE bv.visit_time >
(
SELECT taken
FROM shapshot_time
)
)
ORDER BY
total DESC
LIMIT 4
The query will return you accurate visit count.
The main idea is that you need to scan only the records in bookvisit
that were collected after the statistics were cached.
More than that: you don't even have to scan all records in the cached statistics. Since the number of visits only grows, you can only scan the results that can possibly get into the first four.
If the 4th
record has 1,000,000
page views in the snapshot, and 1,000
page views happened after you took the snapshot, you can only select the records from the snapshot with cnt >= 999,000
. The other records could not theoretically hit this limit, since it would take more than 1K
page views.
The only problem is that you can delete the books or change their categories. In this case you would just need to recalculate the statistics or fall back to your original method.