views:

68

answers:

2

I would like to write a query like the following

select 
  username, 
  (select state from addresses where addresses.username = users.username) email
from users

This works in Oracle (assuming the result from the inner query is unique). However, is there a performance penalty associated with this style of writing query?

+1  A: 

This isn't a join per se, but a concatenation of a subquery with the main select statement.

In order for this to be a join (which is much faster), you would need the following syntax:

select users.username as username, addresses.state as email from users
left join state on (users.username = addresses.username)

The use of subqueries in a select statement always carries a penalty within the DBMS. Think of each subquery as a separate, temporary table that exists in memory only for the duration of the query, after which the DBMS destroys it. First, it allocates memory for the main select, then it allocates even more memory for the subquery table, performs the subquery in its entirety, then looks at both tables to collate the result set.

Also, temporary tables have no index optimization because they are a derived table and the DBMS has been programmed not to take the time to generate an index (unless the DBMS is really, really smart).

Whereas, with a join, memory is only allocated for the main select result table, and is populated once with the main result set, and added to when the joined table is searched (using any indexes set up for the table).

amphetamachine
i would call it a correlated subquery. the interesting twist is it is in the from clause here - so maybe gets another special name?
Randy
It is called a scalar subquery in Oracle
Gary
"performs the subquery in its entirety, then looks at both tables to collate the result set." This is incorrect. Oracle will get a row in the main resultset, then use the username to access the addresses table. If the username column is indexed in addresses, that index will be used.
Gary
+1  A: 

Going back to the original question, joins are what databases do. If they can't do joins well, they fail miserablyin the market. So you'll find that either of the variations discussed here will be fast.

Note that "users" is a reserved word in Oracle -- you probably want to call your table something else. Also note that you'll raise errors in the original formulation if there are multiple records in addresses for a user. Normal joins will just give you multiple rows back.

Using tables of my own with reasonable indexing and amounts of data, the explain plans were:

Original

SELECT STATEMENT  ALL_ROWSCost: 226  Bytes: 390,570  Cardinality: 39,057        
    2 TABLE ACCESS BY INDEX ROWID TABLE X83109.FN_AR_INVOICE Cost: 2  Bytes: 13  Cardinality: 1     
        1 INDEX UNIQUE SCAN INDEX (UNIQUE) X83109.I_FN_AR_INVOICE Cost: 1  Cardinality: 1  
    3 TABLE ACCESS FULL TABLE X83109.FN_AR_LINE_ITEM Cost: 226  Bytes: 390,570  Cardinality: 39,057  

Either ANSI Join (as above) or traditional Oracle syntax, shown below

select users.username as username, addresses.state as email from users, addresses where users.username = addresses.username; 

SELECT STATEMENT  ALL_ROWSCost: 377  Bytes: 898,311  Cardinality: 39,057        
    3 HASH JOIN  Cost: 377  Bytes: 898,311  Cardinality: 39,057     
        1 TABLE ACCESS FULL TABLE X83109.FN_AR_INVOICE Cost: 149  Bytes: 333,788  Cardinality: 25,676  
        2 TABLE ACCESS FULL TABLE X83109.FN_AR_LINE_ITEM Cost: 226  Bytes: 390,570  Cardinality: 39,057 

So there's not a lot of difference. You should feel free to use joins, but also monitor performance.

Jim Hudson