views:

140

answers:

2

I have the following two queries:

select   count(*) 
from     segmentation_cycle_recipients scr
         , segmentation_instance si 
where    si.access_code=scr.access_code 
         and si.segment_id is NOT NULL;

Returns 13429 rows in 0.2 secs

2)

select   count(*) 
from     segmentation_cycle_recipients scr
        , segmentation_instance si, web_pat_info wpi 
where    si.access_code=scr.access_code and scr.siebel_row_id=wpi.siebel_id 
         and si.segment_id is NOT NULL;

Returns 4003 rows in 0.48 secs

Now, I want 1)-2) So I wrote the following query:

select   count(*) 
from     segmentation_cycle_recipients scr
         , segmentation_instance si 
where    si.access_code=scr.access_code 
         and si.segment_id is NOT NULL 
         and scr.siebel_row_id NOT IN (select scr.siebel_row_id 
from     segmentation_cycle_recipients scr
         , segmentation_instance si
         , web_pat_info wpi where si.access_code=scr.access_code 
        and scr.siebel_row_id=wpi.siebel_id and si.segment_id is NOT NULL);

I was expecting 13429-4003=9426 rows but the query takes forever (have to kill the query command) to execute. It even adds a counter in the "slow queries" listing in mysql>status;)

It returns < 100ms in development environment where the result set is much smaller. So I believe the query itself is right.

I believe, using NOT IN is a known performance problem in Mysql (Oracle has the MINUS operator). Any suggestions on how to improve the performance on this query?

+2  A: 
SELECT  COUNT(*)
FROM    segmentation_cycle_recipients scr
JOIN    segmentation_instancs si
ON      si.access_code = scr.access_code
LEFT JOIN
        web_pat_info wpi 
ON      wpi.siebel_id = scr.siebel_row_id
WHERE   wpi.siebel_id IS NULL
        AND si.segment_id is NOT NULL

Make sure that si.access_code and wpi.siebel_id are indexed, and wpi.siebel_id is defined as NOT NULL.

If the latter condition does not hold, replace wpi.siebel_id IS NULL in the WHERE clause with any other column defined as NOT NULL.

Quassnoi
+1  A: 

You may be better served using the NOT EXISTS clause.

select   count(*) 
from     segmentation_cycle_recipients scr
         , segmentation_instance si 
where    si.access_code=scr.access_code 
         and si.segment_id is NOT NULL 
         and NOT EXISTS (select scr2.siebel_row_id 
from     segmentation_cycle_recipients scr2
         , segmentation_instance si2
         , web_pat_info wpi2 where si2.access_code=scr2.access_code 
        and scr2.siebel_row_id=wpi2.siebel_id and si2.segment_id is NOT NULL
        and scr.siebel_row_id=scr2.siebel_row_id);
Suvesh Pratapa
NOT EXISTS as same performance problems as NOT IN
Pritam