views:

97

answers:

5

Suppose master_table contains many records and both the "id" field of the master_table, tableA,tableB,tableC and tableD are the same in the business sense.

For the 2 select statements shown belows ,

Will they both return the same result set?

Which one will have better performance ?

I think if both tableA_tmp ,tableB_tmp,tableC_tmp and tableD_tmp return a smaller result set , SQL1 will be faster than SQL2 because oracle does not need to query tableA_tmp,,tableB_tmp,tableC_tmp and tableD_tmp once for every master_table record.

But if both the tableA_tmp ,tableB_tmp,tableC_tmp and tableD_tmp return the large result set , SQL 2 will be much faster because the cost of joining many large result set is much higher than query tableA_tmp,,tableB_tmp,tableC_tmp and tableD_tmp once for every master_table record.

Please correct me if I have any misunderstanding. Or any others method suggested?

SQL1:

select 
        master_table.* ,
        tableA_tmp.cnt as tableA_cnt , 
        tableB_tmp.cnt as tableB_cnt ,  
        tableC_tmp.cnt as tableC_cnt ,  
        tableD_tmp.cnt as tableD_cnt 
    from
        master_table,
        (select  tableA.id as id, count(1) as cnt  from tableA group by tableA.id) tableA_tmp,
        (select  tableB.id as id, count(1) as cnt from tableB group by tableB.id) tableB_tmp,
        (select  tableC.id as id, count(1) as cnt from tableC group by tableC.id) tableC_tmp,
        (select  tableD.id as id, count(1) as cnt from tableD group by tableD.id) tableD_tmp
    where 
        master_table.id = tableA_tmp.id(+) and
        master_table.id = tableB_tmp.id(+) and
        master_table.id = tableC_tmp.id(+) and
        master_table.id = tableD_tmp.id(+) ;

SQL 2:

   select 
        master_table.* ,
        (select  count(*)  from tableA where tableA.id = master_table.id) as tableA_cnt,
        (select  count(*)  from tableB where tableB.id = master_table.id) as tableB_cnt,
        (select  count(*)  from tableC where tableC.id = master_table.id) as tableC_cnt,
        (select  count(*)  from tableD where tableD.id = master_table.id) as tableD_cnt
    from
        master_table;
+3  A: 

Check out the execution plan

a_horse_with_no_name
+1  A: 

The one who carries out work in smaller period of time.

Vash
+2  A: 

Joins are generally better than inline queries - inline queries get executed for every row that is returned from the main query.

That means (1) is better than (2). In 99% of the cases at least.

In few cases, the distribution of data and way indexes are defined can play a role in tilting the query execution times towards 2 being more efficient, but this happens very rarely in a average database.

Roopesh Shenoy
A: 

Different result sets. Consider if TableA is empty.

In statement 1, tableA_tmp would also be empty. With the outer join, a row would be returned but the value of tableA_tmp.cnt would be null.

In statement 2, the count would be executed and return a zero value.

Performance of either could be better or worse depending on volumes, indexes, scalar subquery caching, table clustering factor, memory....

Gary
A: 

If Master table and other Tables are sufficiently large, SQL 1 will be faster. The reason is table scans. Table scans are expensive as a scan involves I/O. The first query requires only single scan of each of tables tableA, tableB, tablec and tableD.

In SQL2, for every key in master table the tables tableA, tableB, tablec and tableD will be scanned sperately. If master table has 10 rows, table tableA will be scanned 10 times, tableB will be scanned 10 times, tableC will be scanned 10 times and so on. The cost will be high.

Example will illustrate the point. Suppose every table has 1000 records each with distinct id. In first query, the tables tableA, tableB, tablec and tableD will be scanned once each and result will be joined. The resulting number of rows will be large but will be filtered in reasonable time. In SQL2, each table tableA, tableB, tablec and tableD will be scanned 1000 times each. That is very expensive.

Vishnu Gupta