tags:

views:

271

answers:

6

The following query hangs: (although subqueries perfomed separately are fine)

I don't know how to make the explain table look ok. If someone tells me, I'll clean it up.

select
sum(grades.points)) as p,  
from assignments 
left join grades using (assignmentID) 
where gradeID IN 

(select grades.gradeID 
from assignments 
left join grades using (assignmentID) 
where ... grades.date <= '1255503600' AND grades.date >= '984902400' 
group by     assignmentID order by grades.date DESC);

I think the problem is with the first grades table... the type ALL with that many rows seems to be the cause.. Everything is indexed.

I uploaded the table as an image. Couldn't get the formatting right: http://imgur.com/AjX34.png

A commenter wanted the full where clause:

explain extended select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t 
from assignments left join grades using (assignmentID) 
where gradeID IN 
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type 
where assignments.classID = '7815' 
and (assignments.type = 30170 ) 
and grades.contactID = 7141 
and grades.points REGEXP '^[-]?[0-9]+[-]?' 
and grades.points != '-' 
and grades.points != '' 
and (grades.pointsposs IS NULL or grades.pointsposs = '') 
and grades.date <= '1255503600' 
AND grades.date >= '984902400' 
group by assignmentID 
order by grades.date DESC);
A: 

There really isn't enough information to answer your question, and you've put a ... in the middle of the where clause which is weird. How big are the tables involved and what are the indexes?

Having said that, if there are too many terms in an in clause, you can see seriously degraded performance. Replace the use of in with a right join.

For starters, the table *as_types* in the in clause is not used. Left joining it serves no purpose so get rid of it.

That leaves the in clause having only the assignments and grades table from the outer query. Clearly the wheres the modify assignments belong in the where clause for the outer query. You should move all of the where grades=whatever into the on clause of the left join to grades.

stevedbrown
The tables are really big (thousands to hundreds of thousands). Indexes are on everything that I use in where clauses.... is in place of 5 or 6 irrelevant where statements that I thought would clutter it up. I'll append it to the end.
Stephane
I've been trying to find how to do what you're suggesting:Select stuff from table right join (select stuff) but I don't understand how to relate what's in the parenthesis to the outer query.
Stephane
It's probably not a right join, it''s probably a left join.
stevedbrown
A: 

If your subquery performs fine when it is executed separately, then try using a JOIN rather than IN, like this:

select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t 
from assignments left join grades using (assignmentID) 
join
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type 
where assignments.classID = '7815' 
and (assignments.type = 30170 ) 
and grades.contactID = 7141 
and grades.points REGEXP '^[-]?[0-9]+[-]?' 
and grades.points != '-' 
and grades.points != '' 
and (grades.pointsposs IS NULL or grades.pointsposs = '') 
and grades.date <= '1255503600' 
AND grades.date >= '984902400' 
group by assignmentID 
order by grades.date DESC) using (gradeID);
Ian Clelland
+1  A: 

Super messy, but: (thanks for everyone's help)

   SELECT * 
   FROM grades
   LEFT JOIN assignments ON grades.assignmentID = assignments.assignmentID
   RIGHT JOIN (

   SELECT g.gradeID
 FROM assignments a
 LEFT JOIN grades g
 USING ( assignmentID ) 
 WHERE a.classID =  '7815'
 AND (
 a.type =30170
 )
 AND g.contactID =7141
  g.points
 REGEXP  '^[-]?[0-9]+[-]?'
 AND g.points !=  '-'
 AND g.points !=  ''
 AND (
 g.pointsposs IS NULL 
 OR g.pointsposs =  ''
 )
 AND g.date <=  '1255503600'
 AND g.date >=  '984902400'
 GROUP BY assignmentID
 ORDER BY g.date DESC
 ) AS t1 ON t1.gradeID = grades.gradeID
Stephane
A: 

The query is a little tough to follow, but I suspect that the subquery isn't necessary at all. It seems like your query is basically thus:

SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)  
WHERE gradeID IN 
(
SELECT grades.gradeID
FROM assignments LEFT JOIN grades USING (assignmentID)  
WHERE your_conditions = TRUE
);

But, you're not doing anything really fancy in the where clause in the subquery. I suspect something more like

SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)  
GROUP BY groupings
WHERE your_conditions_with_some_tweaks = TRUE;

would work just as well.

If I'm missing some key logic here please comment back and I'll edit/delete this post.

anschauung
+1  A: 

See "The unbearable slowness of IN": http://www.artfulsoftware.com/infotree/queries.php#568

jedihawk
A: 

Suppose you use a Real Database (ie, any database except MySQL, but I'll use Postgres as an example) to do this query :

SELECT * FROM ta WHERE aid IN (SELECT subquery)

a Real Database would look at the subquery and estimate its rowcount :

  • If the rowcount is small (say, less than a few millions)

It would run the subquery, then build an in-memory hash of ids, which also makes them unique, which is a feature of IN().

Then, if the number of rows pulled from ta is a small part of ta, it would use a suitable index to pull the rows. Or, if a major part of the table is selected, it would just scan it entirely, and lookup each id in the hash, which is very fast.

  • If however the subquery rowcount is quite large

The database would probably rewrite it as a merge JOIN, adding a Sort+Unique to the subquery.

However, you are using MySQL. In this case, it will not do any of this (it is gonna re-execute the subquery for each row of your table) so it will take 1000 years. Sorry.

peufeu