views:

131

answers:

5

I have a large historical transaction table (15-20 million rows MANY columns) and a table with one row one column. The table with one row contains a date (last processing date) which will be used to pull the data in the trasaction table ('process_date').

Question: Should I inner join the 'process_date' table to the transaction table or the transaction table to the 'process_date' table?

+6  A: 

This is how I would do it

SELECT <<list only columns you need>> 
FROM large_historical_transaction_table t
WHERE EXISTS (SELECT 1 FROM OneRowTable o 
              WHERE o.last_processing_date = t.process_date)
SQLMenace
I'm sure you know what you're talking about, with a name like SQLMenace and a rep of 17k, but on first glance, it looks like correlated subquery will be executed for each row in the large table?
mdma
+1, @mdma, it looks that way, but this is faster than the regular join
KM
Thanks for clarification. I'm interested to get into the details of how it's faster - do you know of links describing this in more detail?
mdma
@mdma: Generally speaking, a `JOIN` implies returning **all** matching rows whereas `EXISTS` only needs to find the first one. Although in this case it doesn't matter because there's only one row in the lookup table to begin with.
Aaronaught
+1, even though I don't get it! @Aaronaught - I kind of see where you are coming from, but here there is only one row, so what's the actual difference in practice?
mdma
I just tried this in SQL server. Using an INNER JOIN and the subselect - the actual query plans are identical, so the subquery wasn't faster.
mdma
@mdma: With only one row, there probably is no difference in practice. Still, it's good to get into the habit of using `IN` or `EXISTS` when you don't actually need a true `JOIN` - oftentimes (but not always) it will lead to a more efficient plan. Especially on, shall we say, *differently-abled* DBMSes like mysql.
Aaronaught
Differently-abled! hehe! :-) Thanks for the advice, I'll keep that in mind. I kind of have a wired in knee jerk reaction when relating two tables to use a join, I'll have to unlearn that habit.
mdma
+2  A: 

An Inner join is a symmetrical, bi-directional relationship, in general it doesn't matter, but in this case I would suggest not joining at all, read the threshold date into a variable and pass it to the other select query as a parameter...

Charles Bretana
+2  A: 

For readability I would inner join from the transaction table to explicitly indicate that the second table with the date just acts like a filter.

Raj
+2  A: 

When joining tables, the query optimizer takes a quick sniff of both to determine the most appropriate join implementation. Logically the inner join is symmetric, but the implementation may favor one side over the other for improved performance.

mdma
A: 

A cross join will do it:

SELECT t.col1, t.col2, p.process_date
FROM Transactions t, Process p;
dportas
@David. No it wouldn't. It would return **all** records from Transactions. That is not what OP wants.
Lieven
Ah, missed that. In that case it is just an inner join. Thanks for the correction.
dportas
If you change your answer I can do something about that -1.
Keng