views:

220

answers:

2

I have this mysql query that I am trying to analyze. It is very slow, the visitor table here is about 50K entries, this query never returns. When I tried an explain statement, I found out that the index is not being used on the visitor table, In spite of the index being available. Now this is the great puzzle I need help solving. Any hints appreciated.

Query:

select distinct
  visitor0_.ID as ID130_,      

  case when visitor0_1_.id is not null then 1 when
  visitor0_.ID is not null then 0
  end as clazz_

from Visitor visitor0_ 
left outer join Operator visitor0_1_ on visitor0_.ID=visitor0_1_.id
where (visitor0_.ID not in
    (select operator1_.id 
     from Operator operator1_ 
     inner join Visitor operator1_1_ on operator1_.id=operator1_1_.ID))
  and (exists 
    (select visitorpro2_.ID 
     from VisitorProfileField visitorpro2_, ProfileField profilefie3_ 
     where visitorpro2_.profileFieldID=profilefie3_.ID 
       and visitorpro2_.visitorID=visitor0_.ID 
       and profilefie3_.name='subscription86' 
       and visitorpro2_.numberVal=1 
       and visitorpro2_.stringVal='Manual'))

Explain output screen shot: http://grab.by/grabs/9c3a629a25fc4e9ec0fa54355d4a092c.png

+1  A: 

You're query is... big. Can you explain what it acomplishes for you? It looks like it pulls each visitor ID and whether or not they are an operator where they are not an operator and they have a specific profile setup. That doesn't make a ton of sense, so I must be missing something there.

Here's my attempt, based on my understanding of what you're trying to do:

select distinct visitor.ID, IF(operator.id IS NOT NULL, 1, 0) AS clazz
from Visitor left outer join Operator on visitor.ID = operator.id
where not exists 
    (select 'x' from Operator OperatorTwo where OperatorTwo.id = visitor.ID)
and exists
    (select 'x' from VisitorProfileField, ProfileField
        where VisitorProfileField.profileFieldID = ProfileField.ID
        and VisitorProfileField.profileFieldID.visitorID = visitor.ID
        and VisitorProfileField.profileFieldID.numberVal = 1
        and VisitorProfileField.profileFieldID.stringVal = 'Manual'
        and ProfileField .name = 'subscription86')

The joined table named "operator1_1_" doesn't appear to be used, you should be able to remove that. If you are using it just to make sure there is a record for the visitor in that table, I'd use an exists instead of a join. I dropped that.

I've switched your not in to a not exists, which I think may be easier for MySQL to optimize. I used an IF instead of a case because you only have two, and it was shorter to type. I don't know if either one is faster/easier on MySQL.

I can tell you that in my experience MySQL performance dies with subqueries in suqueries. It seems to give up optimising them and starts running them row by row. I bet that if you used a temporary table of results (just for testing purposes) you would find your query runs much faster.

Edit:

Bill went further than I did, I didn't go far enough. I like Bill's query and agree with his conclusions about the CASE statement, which was kind of throwing me.

MBCook
MBCook, thanks for your answer. Please refer to my reply to Bill where I have simplified the query much more. It just tried to pick up visitors who are not operators and who have subscribed to a mailing list.
Senthil
+2  A: 

From what I infer of your query, the following should produce the same result, with no subqueries and a lot faster performance.

select v.ID as ID130_, 0 as clazz_
from Visitor v
left outer join (VisitorProfileField vpf join ProfileField pf 
                   on vpf.profileFieldID = pf.ID)
  on v.ID = vpf.visitorID and pf.name='subscription86' 
    and vpf.numberVal=1 and vpf.stringVal='Manual'
left outer join Operator o on v.ID = o.ID
where o.ID IS NULL;

Please explain if I got some of it wrong. It appears that your NOT IN predicate excludes any Visitor id's that match any id's in Operator. That is, the subquery generates a list of all id's that are in both tables, so the NOT IN condition is equivalent to an outer join to Operator and a simple test where o.ID IS NULL.

This means the CASE expression in your select-list is meaningless, since it will certainly be 0 if your conditions match only Visitor rows that don't match any rows in Operator.

I think something is seriously confused in your query.

Also, it appears that you're using the EAV antipattern in the VisitorProfileField and ProfileField tables. This is going to cause you a lot of trouble.

Bill Karwin
These queries are not generated by me by hand. I am getting these from a java application written using hibernate. I do realize that their is a redundancy with the outer join like you mentioned. But I just liked to leave it in since I can't much do to change the original application source code. But in the end if everything fails I will have to go down that path. I tried to simplify this query and reduce it to the simplest possible similar query which also does not work. and here it is in the next comment.
Senthil
select id from Visitor primary_v where id not in (select Operator.id from Operator) and ((exists (select a.ID from VisitorProfileField a, ProfileField b where a.profileFieldID=b.ID and a.visitorID=primary_v.ID and b.name='subscription86' and b.numberVal=1 and a.stringVal='Manual')))
Senthil
Note that when try the first select alone before the first 'and', it does use the index. Only when I add in the exists statement does the whole thing break down.
Senthil
You should be able to replace the exist with a join and distinct. Try: select distinct id from Visitor primary_v, VisitorProfileField a, ProfileField b where a.profileFieldID=b.ID and a.visitorID=primary_v.ID and b.name='subscription86' and b.numberVal=1 and a.stringVal='Manual' and primary_v.id not in (select Operator.id from Operator)
MBCook