tags:

views:

374

answers:

6

Doing some refactoring in some legacy code I've found in a project. This is for MSSQL. The thing is, i can't understand why we're using mixed left and right joins and collating some of the joining conditions together.

My question is this: doesn't this create implicit inner joins in some places and implicit full joins in others?

I'm of the school that just about anything can be written using just left (and inner/full) or just right (and inner/full) but that's because i like to keep things simple where possible.

As an aside, we convert all this stuff to work on oracle databases as well, so maybe there's some optimization rules that work differently with Ora?

For instance, here's the FROM part of one of the queries:

        FROM    Table1
        RIGHT OUTER JOIN Table2
            ON Table1.T2FK = Table2.T2PK
        LEFT OUTER JOIN Table3
        RIGHT OUTER JOIN Table4
        LEFT OUTER JOIN Table5
            ON Table4.T3FK = Table5.T3FK
               AND Table4.T2FK = Table5.T2FK
        LEFT OUTER JOIN Table6
        RIGHT OUTER JOIN Table7
            ON Table6.T6PK = Table7.T6FK
        LEFT OUTER JOIN Table8
        RIGHT OUTER JOIN Table9
            ON Table8.T8PK= Table9.T8FK
            ON Table7.T9FK= Table9.T9PK
            ON Table4.T7FK= Table7.T7PK
            ON Table3.T3PK= Table4.T3PK
        RIGHT OUTER JOIN ( SELECT   *
                           FROM     TableA
                           WHERE    ( TableA.PK = @PK )
                                    AND ( TableA.Date BETWEEN @StartDate
                                                                    AND     @EndDate )
                         ) Table10
            ON Table4.T4PK= Table10.T4FK
            ON Table2.T2PK = Table4.T2PK
A: 

I may be missing something here, but the only difference between LEFT and RIGHT joins is which order the source tables were written in, and so having multiple LEFT joins or multiple RIGHT joins is no different to having a mix. The equivalence to FULL OUTERs could be achieved just as easily with all LEFT/RIGHT than with a mix, n'est pas?

skaffman
+1  A: 

It could probably be converted to use all LEFT joins: I'd be looking and moving the right-hand table in each RIGHT to be above all the existing LEFTs, then you might be able to then turn every RIGHT join into a LEFT join. I'm not sure you'll get any FULL joins behind the scenes -- if the query looks like it is, it might be a quirk of this specific query rather than a SQL Server "rule": that query you've provided does seem to be mixing it up in a rather confusing way.

As for Oracle optimisation -- that's certainly possible. No experience of Oracle myself, but speaking to a friend who's knowledgeable in this area, Oracle (no idea what version) is/was fussy about the order of predicates. For example, with SQL Server you can write your way clause so that columns are in any order and indexes will get used, but with Oracle you end up having to specify the columns in the order they appear in the index in order to get best performance with the index. As stated - no idea if this is the case with newer Oracle's, but was the case with older ones (apparently).

Whether this explains this particular construction, I can't say. It could simply be less-thean-optimal code if it's changed over the years and a clean-up is what it's begging for.

Chris J
The order in which you specified columns only mattered in Oracle when you used the Rule-Based Optimiser. This was replaced with the Cost-Based Optimiser quite a long time ago and the RBO has been deprecated for a while.
Jeffrey Kemp
+4  A: 

One thing I would do is make sure you know what results you are expecting before messing with this. Wouldn't want to "fix" it and have different results returned. Although honestly, with a query that poorly designed, I'm not sure that you are actually getting correct results right now.

To me this looks like something that someone did over time maybe even originally starting with inner joins, realizing they wouldn't work and changing to outer joins but not wanting to bother changing the order the tables were referenced in the query.

Of particular concern to me for maintenance purposes is to put the ON clauses next to the tables you are joining as well as converting all the joins to left joins rather than mixing right and left joins. Having the ON clause for table 4 and table 3 down next to table 9 makes no sense at all to me and should contribute to confusion as to what the query should actually return. You may also need to change the order of the joins in order to convert to all left joins. Personally I prefer to start with the main table that the others will join to (which appears to be table2) and then work down the food chain from there.

HLGEM
You should see the DB schema! oh, and those PK-FK relationships are really just implicit, and by no means enforced by the database. We've got more orphans than a Monty Python movie.
Beta033
A: 

We have some LEFT OUTER JOINs and RIGHT OUTER JOINs in the same query. Typically such queries are large, have been around a long time, probably badly written in the first place and have received infrequent maintenance. I assume the RIGHT OUTER JOINs were introduced as a means of maintaining the query without taking on the inevitable risk when refactoring a query significantly.

I think most SQL coders are most confortable with using all LEFT OUTER JOINs, probably because a FROM clause is read left-to-right in the English way.

The only time I use a RIGHT OUTER JOIN myself is when when writing a new query based on an existing query (no need to reinvent the wheel) and I need to change an INNER JOIN to an OUTER JOIN. Rather than change the order of the JOINs in the FROM clause just to be able to use a LEFT OUTER JOIN I would instead use a RIGHT OUTER JOIN and this would not bother me. This is quite rare though. If the original query had LEFT OUTER JOINs then I'd end up with a mix of LEFT- and RIGHT OUTER JOINs, which again wouldn't bother me. Hasn't happened to me yet, though.

Note that for SQL products such as the Access database engine that do not support FULL OUTER JOIN, one workaround is to UNION a LEFT OUTER JOIN and a RIGHT OUTER JOIN in the same query.

onedaywhen
Mixing right and left joins bugs me, I guess because in my mind I always think of one side as "I will definitely get my data", and the other side as "I MIGHT get my data", and I always use LEFT JOIN. Mixing them up makes it harder for me to keep track of what data I can count on having.
RedFilter
+1  A: 

LEFT and RIGHT join are pure syntax sugar.

Any LEFT JOIN can be transformed into a RIGHT JOIN merely by switching the sets.

Pre-9i Oracle used this construct:

WHERE  table1.col(+) = table2.col

, (+) here denoting the nullable column, and LEFT and RIGHT joins could be emulated by mere switching:

WHERE  table1.col = table2.col(+)

In MySQL, there is no FULL OUTER JOIN and it needs to be emulated.

Ususally it is done this way:

SELECT  *
FROM    table1
LEFT JOIN
        table2
ON      table1.col = table2.col
UNION ALL
SELECT  *
FROM    table1
RIGHT JOIN
        table2
ON      table1.col = table2.col
WHERE   table1.col IS NULL

, and it's more convenient to copy the JOIN and replace LEFT with RIGHT, than to swap the tables.

Note that in SQL Server plans, Hash Left Semi Join and Hash Right Semi Join are different operators.

For the query like this:

SELECT  *
FROM    table1
WHERE   table1.col IN
        (
        SELECT  col
        FROM    table2
        )

, Hash Match (Left Semi Join) hashes table1 and removes the matched elements from the hash table in runtime (so that they cannot match more than one time).

Hash Match (Right Semi Join) hashes table2 and removes the duplicate elements from the hash table while building it.

Quassnoi
A: 

The bottom line is that this is a very poorly formatted SQL statement and should be re-written. Many of the ON clauses are located far from their JOIN statements, which I am not sure is even valid SQL.

For clarity's sake, I would rewrite the query using all LEFT JOINS (rather than RIGHT), and locate the using statements underneath their corresponding JOIN clauses. Otherwise, this is a bit of a train wreck and is obfuscating the purpose of the query, making errors during future modifications more likely to occur.

doesn't this create implicit inner joins in some places and implicit full joins in others?

Perhaps you are assuming that because you don't see the ON clause for some joins, e.g., RIGHT OUTER JOIN Table4, but it is located down below, ON Table4.T7FK= Table7.T7PK. I don't see any implicit inner joins, which could occur if there was a WHERE clause like WHERE Table3.T3PK is not null.

The fact that you are asking questions like this is a testament to the opaqueness of the query.

RedFilter
Well, the statement is technically valid according to MSSQL as it parses correctly and runs. As to the validity of the results, however, your guess is as good as mine.
Beta033
Yeah, I meant valid according to ANSI 92, but I find that documentation very hard to read, so I am too lazy to check for myself :)
RedFilter