views:

239

answers:

3

Hi, I have two queries, and I want to understand which is better in terms of performance and memory. I also welcome other alternatives for these two.

Query 1:

SELECT DISTINCT a.no,
   a.id1            ,
   a.id2
   FROM  tbl_b b         ,
         tbl_a a ,
         tbl_c c    ,
         tbl_d d
   WHERE (
          b.id1              = a.id1
      AND a.id1              = c.id1
      AND upper(c.flag)      = 'Y'
      AND c.id1              = d.id1
   )
   OR (
          b.id2              = a.id2
      AND a.id2              = c.id2
      AND upper(c.flag)      = 'Y'
      AND c.id2              = d.id2
)
AND d.id3 = 10

Here tables b and d are very large tables running over 500,000 to millions of rows, while table a is relatively smaller.

My requirement is to pick up only those records from table a, whose id (either id1 or id2) is available in b,c,d tables, satisfying certain other conditions as well.

The alternate query that I have in hand is

Query 2:

SELECT DISTINCT a.no,
   a.id1             ,
   a.id2
   FROM  tbl_a a
   where exists ( select a.id1, a.id2 from           
          tbl_c c where ((a.id1 = c.id1 or a.id2 = c.id2)
          AND upper(c.active_flag) = 'Y'))
   and exists ( select a.id1, a.id2 from 
          tbl_b b where  b.id1 = a.id1 or b.id2 = a.id2)
   and exists ( select  a.id1, a.id2 from tbl_d d
               where (a.id1 = d.id1 or a.id2 = d.id2)
               AND d.id3 = 10)

Which is best performance-wise? I understand that query two occupies lesser space than query 1. But how do I select the best one?

+4  A: 

The way to select the best one is to try both with realistic data and see which performs best. If they are logically equivalent queries, then it is quite likely that the optimizer will come up with the same plan for both anyway, which you will be able to see by using AUTOTRACE or TKPROF or similar tools.

Tony Andrews
Hi Tony, my mistake that was.. I missed the paranthesis. I have now updated the query, it is ( (a or b) and c ) in my condition.
OracleNewbie
OK, I have removed my comments about parentheses. I'm also happier now that the two queries are logically equivalent.
Tony Andrews
+1, always try it yourself when feasible.
DCookie
+1  A: 

Update:

See this article in my blog for performance details:

This one will be the most efficient:

SELECT  a.no,
        a.id1,
        a.id2
FROM    tbl_a a
WHERE   EXISTS
        (
        SELECT  1
        FROM    tbl_b b
        WHERE   b.id1 = a.id1
        )
        AND EXISTS
        (
        SELECT  1
        FROM    tbl_c c
        WHERE   c.id1 = a.id1
                AND UPPER(c.flag) = 'Y'
        )
        AND EXISTS
        (
        SELECT  1
        FROM    tbl_d d
        WHERE   d.id1 = a.id1
                AND d.id3 = 10
        )
UNION
SELECT  a.no,
        a.id1,
        a.id2
FROM    tbl_a a
WHERE   EXISTS
        (
        SELECT  1
        FROM    tbl_b b
        WHERE   b.id2 = a.id2
        )
        AND EXISTS
        (
        SELECT  1
        FROM    tbl_c c
        WHERE   c.id2 = a.id2
                AND UPPER(c.flag) = 'Y'
        )
        AND EXISTS
        (
        SELECT  1
        FROM    tbl_d d
        WHERE   d.id2 = a.id2
                AND d.id3 = 10
        )

Oracle is not too good in optimizing OR conditions.

Separating the queries should help the optimizer to use two different execution plans for each of the OR'ed parts.

You should create the following indexes:

tbl_b (id1)
tbl_b (id2)
tbl_c (id1, UPPER(flag))
tbl_c (id2, UPPER(flag))
tbl_d (id1, id3)
tbl_d (id2, id3)

Note the function-based index on UPPER(flag), you should create it exactly this way.

Quassnoi
You changed the query - ('y','Y') != ('Y') Union causes a sort on the returned result sets. The first select is missing "AND d.id3 = 10"
@glenn: the original query said UPPER(flag) = 'Y', which is equivalent to flag IN ('y', 'Y')
Quassnoi
@glenn: UNION does not necessarily causes a sort, this depends on the method chosen by the optimizer: SORT UNIQUE or HASH UNIQUE, the latter doesn't sort
Quassnoi
+1, nice illustration.
DCookie
+1  A: 

There is not enough information to give you a solid answer. Are there any indexes? For TBL_C, what percentage of the records have a FLAG = 'Y' and how many records are in that table? For table TBL_D, how many records with an ID3 = 10? Is the 10 hard coded value or could it be any given supplied value (and you will use bind variables - right?)

Tony is right about using autotrace and tkprof to see which one does the least amount of work. I usually start with getting a SQL_PLAN for each and just run them if the cost is what I expect. With the amount of data and proper indexes either one should not run more then a second or two.

There is enough information to suggest what indexes are needed.
jva
Sure about that? How big are the tables? We know about the number of records but not how large they are. Are the FLAG fields set to yes on 1 percent or 99 percent of the records? Same for TBL_D?I have experienced a lot more problems with queries using indexes when they should be using full scans then the reverse. Indexes are not evil but they do not solve everything either.