views:

619

answers:

4

I've got a query joining several tables and returning quite a few columns.

An indexed column of another table references the PK of one of these joined tables. Now I would like to add another column to the query that states if at least one row with that ID exists in the new table.

So if I have one of the old tables

ID
 1
 2
 3

and the new table

REF_ID
1
1
1
3

then I'd like to get

ID   REF_EXISTS
 1            1
 2            0
 3            1

I can think of several ways to do that, but what is the most elegant/efficient one?


EDIT I tested the performance of the queries provided with 50.000 records in the old table, every other record matched by two rows in the new table, so half of the records have REF_EXISTS=1.

I'm adding average results as comments to the answers in case anyone is interested. Thanks everyone!

+1  A: 

Use:

   SELECT DISTINCT t1.id,
          CASE WHEN t2.ref_id IS NULL THEN 0 ELSE 1 END AS REF_EXISTS
     FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.ref_id = t1.id

Added DISTINCT to ensure only unique rows are displayed.

OMG Ponies
Sorry, but wouldn't that give me several rows for id=1 and none for id=2?
Peter Lang
Thanks, the updated version took approximately 0.17s in my test.
Peter Lang
+1  A: 

A join could return multiple rows for one id, as it does for id=1 in the example data. You can limit it to one row per id with a group by:

SELECT 
    t1.id
,   COUNT(DISTINCT t2.ref_id) as REF_EXISTS
FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.ref_id = t1.id
GROUP BY t1.id

The group by ensures there's only one row per id. And count(distinct t2.ref_id) will be 1 if a row is found and 0 otherwise.

EDIT: You can rewrite it without a group by, but I doubt that will make things easer:

SELECT 
    t1.id
,   CASE WHEN EXISTS (
        SELECT * FROM TABLE_2 t2 WHERE t2.ref_id = t1.id)
        THEN 1 ELSE 0 END as REF_EXISTS
,   ....
FROM TABLE_1 t1
Andomar
Yes, this would work. But I'd love to avoid the group by, since I'm selecting another 30 columns... Any other ideas?
Peter Lang
My experience says use the first one for best efficiency. If you have an index on t2.ref_id oracle should be pretty smart about using it. Be sure to use EXPLAIN PLAN as you choose.
Ollie Jones
You are right, the first one was more efficient in my test (0.20s). It did not use the index on t2.ref_id, providing the hint to use it resulted in the same performance (different execution plan, though). The second query is the only one provided that needs the index on t2.ref_id (0.25s), when the index does not exist it takes about 3 minutes :)
Peter Lang
+3  A: 

Another option:

select O.ID
    , case when N.ref_id is not null then 1 else 0 end as ref_exists
from old_table o
left outer join (select distinct ref_id from new_table) N
   on O.id = N.ref_id
Shannon Severance
+1 since you beat me by 7 minutes with this query. This one nicely groups ref_id's before outer joining the set to old_table. I'd use nvl2(n.ref_id,1,0) instead of your case expression though.
Rob van Wijk
This one is the fastest query, average test time 0.06s. And no need for a GROUP BY :)
Peter Lang
+1  A: 

I would:

select distinct ID,
       case when exists (select 1 from REF_TABLE where ID_TABLE.ID = REF_TABLE.REF_ID)
    then 1 else 0 end
    from ID_TABLE

Provided you have indexes on the PK and FK you will get away with a table scan and index lookups.

Regards K

Khb
Thanks, took 0.18s in my test.
Peter Lang