views:

172

answers:

9

We are developing ETL jobs and our consultant have been using "old style" SQL when joining tables

select a.attr1, b.attr1
from table1 a, table2 b
where a.attr2 = b.attr2

instead of using inner join clausule

select a.attr1, b.attr1
from table1 as a inner join table2 as b
   on a.attr2 = b.attr2

My question is that in the long run, is there a risk for using the old "where join"? How long this kind of joins are supported and kept as ANSI standard? Our platform is SQL Server and my primary cause is that in the future these "where joins" are no longer supported. When this happens, we have to modify all our ETL jobs using "inner join" style of joins.

+3  A: 

There are lots of reasons to avoid implicit joins. The big ones are:

  • It can't easily be changed to an outer join.
  • It's easier to forget the join condition with an implicit join.
  • If you mix both implicit and explicit joins you get problems with confusing precedence. Here's an example from a few hours ago: MySQL Syntax error

I don't think they will be removed any time soon, but there are plenty of other reasons to stop using them.

Mark Byers
+2  A: 

As long as they're not using *= and =* for their join syntax (which has been deprecated as of SQL Server 2008 R2), I can't see it going away in the long run, but as Mark Byers says, there are plenty of reasons not to do it.

My biggest concern would be that if they're writing joins like this, what else are they doing that's unconventional?

LittleBobbyTables
+5  A: 

Rather than worry about some possible risk in the future, why not worry about the risk you face right now?

In addition to Mark's points

  • the code is harder to read (and thus understand the purpose of), when the ON clauses are disconnected (sometimes by many lines) from the joined tables. This increases the likelihood of errors when modifying the code.
  • determining what kind of JOIN is being done is harder - you have to go surfing through the WHERE clause and hope that what you see is right
  • finding missing JOIN clauses is much harder, increasing the risk of an indavertent Cartesian join - when you use ANSI syntax, the ON clauses line up nicely, making this trivial
RedFilter
+4  A: 

I doubt that "where joins" would ever be unsupported. It's just not possible to not support them, because they are based on Cartesian products and simple filtering. They actually aren't joins.

But there are many reasons to use the newer join syntax. Among others:

  • readability
  • maintainability
  • easier change to out joins
Stefan Steinegger
+3  A: 

Both syntaxes are supported by the latest versions of ISO SQL (2003,2008). Using commas to specify a cross join in the FROM clause is perfectly standard SQL and is supported by all SQL produts that I have come across. It seems highly unlikely that it ever would or even could be deprecated or desupported within SQL.

dportas
A: 

If you are worried they will be removed from the Standard or from SQL products then worry not. It is not likely to happen ever.

This 'old style' of join is just that: merely a question of style. Those who favour it swear there are situations where 'old style' joins are easier to understand. While I'm not entirely convinced myself, one thing is for sure: the old style joins will be encountered from time to time and re-engineering code to suit your own personal style isn't always appropriate. Therefore, get used to the style and learn to work with it.

onedaywhen
No they aren't just a matter of style, there are very real problems with using this kind of join.
HLGEM
@HLGEM -- you are only saying that because it isn't to your taste.
onedaywhen
Np I'm saying that because it causes wrong queries results and maintenance problems.
HLGEM
@HLGEM -- not if you are used to the style. I appreciate what you are saying but the 'old style' is more common than perhaps you realise. You can try but you won't rid the world of it. Better to make peace with it :)
onedaywhen
Just because something is common is no excuse to use it in new development when it is risky. There is no reason to EVER to use this syntax which has been outdated for 18 years and if you are maintaining a piece of code that uses it, there is no excuse not to fix it just like you would fix bad C# code using an outdated method that was replaced 8 years ago with something less risky. It is bad syntax, there is no excuse to ever use it.
HLGEM
@HLGEM: "There is no reason to EVER to use this syntax" - there is if you're using a database (eg. Oracle up to v.9) that doesn't support the JOIN syntax. Not applicable in this case, but in plenty of others.
Mark Bannister
@HLGEM: What do you mean by "wrong" query results? Do you mean that you think the developer is less likely to make mistakes when using an ON clause than using the same expressions in the WHERE clause?
dportas
Implied right and left join syntax even as far back as SQL Server 2000 gives the wrong results some of the time as it doesn't alawys intrepet it correct as an outer join. Accidental cross joins are not possible with explicit join syntax and thus implied syyntax can return a result set that is wrong but doesn't appear to the developer to be wrong. Queries with a mix of explicit and implicit joins can be misinterpreted by the optimizer and return incorrect results. This is why the old syntax is risky and should not be used.
HLGEM
@HLGEM: The "comma join" syntax is just as much a part of current SQL standard as is the JOIN clause. It's not "old". The comma join is in fact more general the a INNER JOIN. Not all comma joins can be reduced to INNER JOINs because the comma is equivalent to a CROSS JOIN in SQL. Your comments about left and right don't apply unless you are using left right join operators (I agree you should use the OUTER JOIN syntax for that but that's not in question here). More people are familiar with the comma syntax and it is compatible with more software. Those are at least some good reasons to use it.
dportas
"The comma join is in fact more general the a INNER JOIN. Not all comma joins can be reduced to INNER JOINs because the comma is equivalent to a CROSS JOIN in SQL." That is extactly why it should not be used. It is outdated, risky code.
HLGEM
By "risky" I assume you mean "more readable" and "clearer what's going on". However, that's a very subjective judgement. I'm proficient with both syntaxes but I have seen examples where I found that using a mixed combination of INNER JOIN / CROSS JOIN and ON clauses instaed of one WHERE clause made it harder for me to understand what the query meant. Both syntax are valid. Use whichever makes the intention clearer.
dportas
A: 

The right and left join implied syntax *= and =* is deprecated and for good reason, it currently does not return correct results at all times. If they have used this, these MUST be fixed now as they currently put you at risk for incorrect results. This is not an option. The other syntax will continue to work but should be replaced as well for several reasons. First it is very easy to get accidental cross joins which can return bad results or which are fixed by using distinct which can create performance issues.

Another issue is maintenance. If people later add other joins to the query and start mixing implied and explicit joins again you can get wrong results and not even know it. It is VERY VERY bad to leave this kind of crappy code in your codebase. Implied joins are also harder to understand and because they are often written by developers who don't understand joins, they may not be what you need anyway. And if there is a cross join in the query how is the maintainer to know if it was a bug (and accidental cross join) or a deliberate cross join (we do occasionally really need them). I would not accept this code as written. I would insist that the incompetent who wrote it, fix it at no additional charge.

HLGEM
+1  A: 

It is difficult to argue about elegance or ugliness of a certain syntax construction. You just see it or don't. Comma separated join syntax reflects the fundamental feature of Relational Algebra, which asserts the normal form for select-project-join queries. The only kind of join that escapes it (and therefore, warrants a dedicated syntax) is the outer join. Accidental mistakes of missing equality predicates that render join graph disjoint is just a matter of how sophisticated your front-end SQL programming tool is (does it display join graph at all?).

It is not only aesthetics. It is common for production databases to have columns like CREATED_ON, or COMMENTS across many tables. In this case the NATURAL JOIN syntax is plain dangerous.

As Anthony Molinaro (author of popular "SQL Cookbook") eloquently put it: “Old style is short and sweet and perfect. ANSI dumbed it down, and for people who've been developing for sometime, it's wholly unnecessary".

Tegiri Nenashi
A: 

People have had some good points but so far there are two big ones that haven't been mentioned:

  1. Whether or not the old style *= and =* outer joins gave correct results, they also cannot properly denote certain joins. Consider the following query. We want to show all customers who have not placed an order over $100:

    SELECT
    FROM
       Customer C
       LEFT JOIN Order O ON C.OrderID = O.OrderID AND O.OrderTotal >= 100.0;
    WHERE
       O.OrderID IS NULL;
    

    Try and express that the old way... if you can. Without using derived tables.

  2. To me the great value of using proper join clauses is the separation of standard join conditions (which would be applied in almost every query involving those two tables) from the special filters for this query that will return the desired rows:

    SELECT
       C.FullName,
       C.CustomerCode,
       O.OrderDate,
       O.OrderTotal,
       OD.ExtendedShippingNotes
    FROM
       Customer C
       INNER JOIN Order O ON C.CustomerID = O.CustomerID
       INNER JOIN OrderDetail OD ON O.OrderID = OD.OrderID
    WHERE
       C.CustomerStatus = 'Preferred'
       AND O.OrderTotal > 1000.0;
    

    This separation means that the developer looking at the query doesn't have to deal with a bunch of clutter while scanning for this query's distinctives. If he's familiar with the tables he can skip the FROM clause entirely and just read the WHERE clause to get all the information he needs. It's faster. And if you don't care about faster, even when just scanning queries with your eyeballs, I don't want to work with you.

    Now for those who think there's something special about the location of everything when using JOIN syntax, you're wrong. The following query works just as fine as the upper one:

    SELECT
       C.FullName,
       C.CustomerCode,
       O.OrderDate,
       O.OrderTotal,
       OD.ExtendedShippingNotes
    FROM
       Customer C
       CROSS JOIN Order O
       INNER JOIN OrderDetail OD
          ON C.CustomerID = O.CustomerID
          AND C.CustomerStatus = 'Preferred'
          AND O.OrderTotal > 1000.0
    WHERE
       O.OrderID = OD.OrderID;
    

    This query probably even has the exact same execution plan. Surprised? Don't be. Just like the old-style syntax, the optimizer is the one that's responsible for figuring out how to join your tables based on the conditions you give. It doesn't really matter where the conditions are as long as they don't refer to a table that hasn't been mentioned yet.

So what is the big difference between the two styles? If you think the second mixed-up query above is hard to understand and would be a crazy way to write, then, well, you naturally think the old style of query is lame. Because, frankly, putting all the conditions haphazardly into any old place is disorganized. The organization system of JOINs makes sense. If you are used to the old style and don't really like the new style, that's probably because change is unpleasant (for all of us). But once you use it for a while, I'm sure it will grow on you. At least, if it doesn't, I can't possibly understand why.

Emtucifor