A: 

That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.

Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.

The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.

Langdon
This should be a comment to the question, not an answer...
Seb
Except I did answer him.. he needs to remove the sub queries. There's no way to fix the provided query without refactoring... plain and simple.
Langdon
+1  A: 

Here how to find the bottleneck:

Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.

To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

Todd Moses
+1  A: 

A couple of things to consider:

  1. Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.

  2. Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...

  3. Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.

-aj

AJ
I agree with point 2. The struggle between data normalization and performance is a tough one.
jason
Although I disagree with point 3. The difference between indexed enum and varchar fields is almost nil. Read: http://www.mysqlperformanceblog.com/2008/01/24/enum-fields-vs-varchar-vs-int-joined-table-what-is-faster/
jason
Interesting article...thanks for this pointer.
AJ
Ah, "The costly illusion"... http://www.dbdebunk.com/page/page/622348.htm
MaD70
+6  A: 

It looks like you can replace all the subqueries with joins:

SELECT     l.object_id,
           l.user_id,
           <other columns from lists>
           AVG(i.rating) as avgrating,
           AVG(i.avg_rating) as avgavgrating,
           <other averages>
FROM       lists l
LEFT JOIN  list_items li 
ON         li.list_id = l.object_id
LEFT JOIN  items i 
ON         i.object_id = li.object_id
           AND i.status = 'A'
WHERE      l.user_id = $user_id AND l.status = 'A' 
GROUP BY   l.object_id, l.user_id, <other columns from lists>

That would save a lot of work for the DB engine.

Andomar
This is probably the best answer so far, however, eventually, given enough data, performance on this query will degrade. Caching the averages is going to be necessary sooner or later.
Langdon
@Langdon: It looks like they're only averaging for a single user, which would be pretty fast with the right indexes
Andomar
Thanks so much Andomar! Worked like a charm. The query now returns in const time since we have good indexing on all the tables. It was the convoluted way the original query was written that was killing the DB.
Russell C.
The reasons why people like writing such monstrosities in SQL escapes me.
MaD70
It's a combination of laziness and a lack of understanding. The answer to this question will hopefully help me avoid doing anything like this in the future.
Russell C.
Well, I confess that I don't particularly like this answer also: how do you know that is correct? SQL has particular corner cases in which even something that seems obviously right it is not (hint: the demented treatment of NULLs). I prefer to construct a query incrementally, each piece simple to understand and to check for correctness. That's why I often suggest to use views.
MaD70
@MaD70: This is *not* a complex SQL query -- it only has two joins.
Bill Karwin
@Bill Karwin: just because you people are addicted to joins does not means that is easy to reason about joins. Languages based on relational *calculi* are better in this respect. Codd himself was in favor of **not** exposing the algebra directly (and that SQL is a mix of the two is even worse).
MaD70
A: 

Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance, by Fabian Pascal).

I will rewrite your query, below, to highlight that:

let LI =

  select object_id from list_items where list_id=lists.object_id

in

  select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
         (select AVG(rating)            from items where object_id IN LI AND status="A") as 'avg_rating',
         (select AVG(avg_rating)        from items where object_id IN LI AND status="A") as 'avg_avg_rating',
         (select AVG(length)            from items where object_id IN LI AND status="A") as 'avg_length',
         (select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
    from lists
   where user_id=$user_id AND status="A"
order by $orderby
   LIMIT $start, $step

Note: this is only the first step to refactor that beast.

I wonder: why people rarely - if at all - use views, even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.

MaD70