ansaurus

Question

Is performing a count() calculation slowing down my mysql query?

Answer 1

+3 A:

for features like that, it is always helpful to work with some kind of caching...

It might already help to create sums for all users on a nightly basis and store those sums with the user. This will help a lot and speed up your search.

You should also cache this request somehow for at least a minute or five since you will execute the same request independently on whos logged in.

Ralf 2010-01-08 19:33:00

I suggest you also pair an "as of" date for that batch-calculated sum in addition to the sum.

Mike Atlas 2010-01-08 19:57:34

Answer 2

A:

I often find that creating a smaller temporary table from a larger table will have noticable speed benefits.

So the basic process:

store query (with joins) into temporary table
run count/summary queries on temporary table

John M 2010-01-08 19:42:16

Answer 3

+2 A:

You might try seeing if there is an improvement from removing the DISTINCT modifiers. Assuming the DISTINCTed fields are primary keys anyway, this could be causing unnecessary work.

recursive 2010-01-08 19:53:17

I tried this and ended up with a count of thousands for each of the count fields.

mandel 2010-01-08 20:19:57

It sounds like you might have duplicate records your database. Have you checked your tables to see if they make sense?

recursive 2010-01-08 23:12:28

I'll review the tables to make sure - perhaps I need to make primary keys on some of these as combinations of fields rather than a straight ID. Bookshelf, for example, has ID, user_id, cookbook_id. The combo of user_id and cookbook_id should be unique...

mandel 2010-01-09 02:56:17

A single column "artificial" primary key is not a bad idea, but they should already be guaranteed to be unique, so the fact that you get different results after removing the DISTINCT suggests database problems to me.

recursive 2010-01-09 05:19:56

There are no other database issues that I've found so far - every other query using these tables works as expected. I'm going to review the logic here and see if I'm missing something. I'm also going to try a caching/calculated field.

mandel 2010-01-09 16:31:10

I figured it out. You're doing a cartesian join across the reviews and the bookshelf. If you used a separate query for each type of review, I suspect your query would speed up by orders of magnitude.

recursive 2010-01-09 22:15:35

YES! This is the problem. The only question left is how to add up the results to get a field which aggregates book reviews and recipe reviews.

mandel 2010-01-10 17:43:53

Answer 4

+2 A:

Index all tables on user_id. This could easily speed this query up by orders of magnitude if it hasn't been done yet.

recursive 2010-01-08 19:59:50

Alas, there are already indexes on each of the user_id fields.

mandel 2010-01-08 20:18:18

Answer 5

A:

Why not just store the number of reviews per user as a column in the users table? Every new review the user does should also require an increment of the value of the their user record review count by one.

For example:

user_id user_name number_of_reviews
1       bob       5
2       jane      10

Bob puts a new review in, and you up his number to 6:

review_id user_id review_text
16        1       "Great!"

user_id user_name number_of_reviews
1       bob       6
2       jane      10

Now you can simply get the top 5 reviewers like this:

SELECT * FROM users ORDER BY number_of_reviews DESC LIMIT 5

Mike Atlas 2010-01-08 20:00:24

I'd considered something like this early in my design of the website and been told (on SO) that I shouldn't rely on incremented columns over queries. But that may have been a more general caution as I had started using incremented columns for a number of things.

mandel 2010-01-08 20:23:37

I can't think of a problematic scenario in your design that this (number_of_reviews) would be risky. If it represented real physical inventory or money amounts, I would advise a bit more caution. But otherwise this should suffice. Don't make it hard on yourself!

Mike Atlas 2010-01-08 21:47:16

Also, if you were ever in doubt that the count was off, you could recalculate the number of reviews per user on an "offline" database copy to see if there is any difference, by doing what you were doing above (count * with a join).

Mike Atlas 2010-01-08 21:49:47

Incremented columns is generally a terrible idea that inevitably will lead to data inconsistency. It could be considered on very large databases with many reads, but your DB is small so you certainly should avoid it at all times.

eckesicle 2010-01-09 14:53:06

Can you give some examples of how in this case it would "lead to" data inconsistency?

Mike Atlas 2010-01-09 15:51:46

I will dismiss your comment as a drive-by job then...

Mike Atlas 2010-01-11 18:47:16

Answer 6

+1 A:

You are trying to accomplish too many things with this query. I see problems with your db / query design. Why do you have a user_id in book_shelf? How about the following table structure

CREATE TABLE users (
id INT NOT NULL AUTO_INCREMENT ,
name VARCHAR( 20 ) NOT NULL ,
PRIMARY KEY ( `id` )
)

CREATE TABLE recipe_reviews (
id INT NOT NULL AUTO_INCREMENT ,
review VARCHAR( 20 ),
user_id INT,
PRIMARY KEY (id),
FOREIGN KEY (user_id) references users(id)
)

CREATE TABLE bookshelf (
id INT NOT NULL AUTO_INCREMENT ,
name VARCHAR( 20 ) NOT NULL ,
PRIMARY KEY ( id )
)

CREATE TABLE book_reviews (
id INT NOT NULL AUTO_INCREMENT ,
review VARCHAR( 20 ),
user_id INT,
bookshelf_id INT,
PRIMARY KEY (id),
FOREIGN KEY (user_id) references users(id),
FOREIGN KEY (bookshelf_id) references bookshelf(id)
)

If you want to aggregate on users, here is your query :

SELECT users.*, COUNT(book_reviews.ID ) as bookreviews, COUNT( recipe_reviews.ID ) AS recipereviews, bookreviews + recipereviews as reviewtotal
    FROM users
    LEFT OUTER JOIN recipe_reviews ON recipe_reviews.user_id = users.ID
    LEFT OUTER JOIN book_reviews ON book_reviews.user_id = users.ID
    GROUP BY users.ID
    ORDER BY reviewtotal DESC

You can also aggregate on both users and books, then including the recipe_reviews doesn't make sense.

PS: you don't need the DISTINCT as you have the keys take care of that.

Dave.Sol 2010-01-08 20:02:28

Thanks for your thoughts. However, there's a user_id in bookshelf because each user has their own bookshelf to which they can add any book on the site, so there has to be an association with the user_id to find out how many books each user has in their shelf.As for the foreign keys, I'm using MyISAM tables for these, so I can't use FKs. Would switching to InnoDB and FKs make a real difference to performance?

mandel 2010-01-08 20:34:07

Foreign keys would be a hit on the performance in general because of the constraint checks that have to be performed during insert (and possibly update/delete). But specifically for data retrieval with this query, I don't see any difference as you have the indices. I would go for InnoDB though - at least for the purposes of data integrity.

Dave.Sol 2010-01-08 21:18:49

Answer 7

+2 A:

You need to create indexes on user_id (preferably clustered indexes if possible).

Are you sure you have done this? Remember that having a foreign key does not automatically generate an index on that key.

If you are joining 4 B-Trees of 1k rows each, this should surely not take 9s, but a few milliseconds.

The long execution time indicates that you are performing table scans for each user.

I'm pretty convinced this is the correct answer.

Your query is fine except that you are COUNTing your reviews twice, replace the second count with bookreviews and numreviews.

eckesicle 2010-01-09 14:51:16

ansaurus

tags:

views:

answers:

Is performing a count() calculation slowing down my mysql query?

related questions