tags:

views:

1223

answers:

10

When I started writing database queries I didn't know the JOIN keyword yet and naturally I just extended what I already knew and wrote queries like this:

SELECT a.someRow, b.someRow 
FROM tableA AS a, tableB AS b 
WHERE a.ID=b.ID AND b.ID= $someVar

Now that I know that this is the same as an INNER JOIN I find all these queries in my code and ask myself if I should rewrite them. Is there something smelly about them or are they just fine?


EDIT:

My answer summary: There is nothing wrong with this query BUT using the keywords will most probably make the code more readable/maintainable.

My conclusion: I will not change my old queries but I will correct my writing style and use the keywords in the future.

THANKS for your answers!

+4  A: 

Nothing is wrong with the syntax in your example. The 'INNER JOIN' syntax is generally termed 'ANSI' syntax, and came after the style illustrated in your example. It exists to clarify the type/direction/constituents of the join, but is not generally functionally different than what you have.

Support for 'ANSI' joins is per-database platform, but it's more or less universal these days.

As a side note, one addition with the 'ANSI' syntax was the 'FULL OUTER JOIN' or 'FULL JOIN'.

Hope this helps.

Jared
+3  A: 

I avoid implicit joins; when the query is really large, they make the code hard to decipher

With explicit joins, and good formatting, the code is more readable and understandable without need for comments.

Cade Roux
+1 Yes. And a missing join predicate is easier to spot. The JOIN syntax makes a SQL statement that is wrong look wrong, it's more obvious that there's a problem.
spencer7593
+6  A: 

It's more of a syntax choice. I prefer grouping my join conditions with my joins, hence I use the INNER JOIN syntax

SELECT a.someRow, b.someRow
FROM tableA AS a
INNER JOIN tableB AS b
  ON a.ID = b.ID
WHERE b.ID = ?

(? being a placeholder)

R. Bemrose
+6  A: 

The more verbose INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN are from the ANSI SQL/92 syntax for joining. For me, this verbosity makes the join more clear to the developer/DBA of what the intent is with the join.

Forgotten Semicolon
AND it makes it easier to spot a missing join predicate. The JOIN syntax makes a wrong statement look wrong. A missing join predicate is harder to spot in the older style syntax.
spencer7593
+5  A: 

In SQL Server there are always query plans to check, a text output can be made as follows:

SET SHOWPLAN_ALL ON
GO

DECLARE @TABLE_A TABLE
(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Data VARCHAR(10) NOT NULL
)
INSERT INTO @TABLE_A
SELECT 'ABC' UNION 
SELECT 'DEF' UNION
SELECT 'GHI' UNION
SELECT 'JKL' 

DECLARE @TABLE_B TABLE
(
    ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    Data VARCHAR(10) NOT NULL
)
INSERT INTO @TABLE_B
SELECT 'ABC' UNION 
SELECT 'DEF' UNION
SELECT 'GHI' UNION
SELECT 'JKL' 

SELECT A.Data, B.Data
FROM
    @TABLE_A AS A, @TABLE_B AS B
WHERE
    A.ID = B.ID

SELECT A.Data, B.Data
FROM
    @TABLE_A AS A
    INNER JOIN @TABLE_B AS B ON A.ID = B.ID

Now I'll omit the plan for the table variable creates, the plan for both queries is identical though:

 SELECT A.Data, B.Data  FROM   @TABLE_A AS A, @TABLE_B AS B  WHERE   A.ID = B.ID
  |--Nested Loops(Inner Join, OUTER REFERENCES:([A].[ID]))
       |--Clustered Index Scan(OBJECT:(@TABLE_A AS [A]))
       |--Clustered Index Seek(OBJECT:(@TABLE_B AS [B]), SEEK:([B].[ID]=@TABLE_A.[ID] as [A].[ID]) ORDERED FORWARD)
 SELECT A.Data, B.Data  FROM   @TABLE_A AS A   INNER JOIN @TABLE_B AS B ON A.ID = B.ID
  |--Nested Loops(Inner Join, OUTER REFERENCES:([A].[ID]))
       |--Clustered Index Scan(OBJECT:(@TABLE_A AS [A]))
       |--Clustered Index Seek(OBJECT:(@TABLE_B AS [B]), SEEK:([B].[ID]=@TABLE_A.[ID] as [A].[ID]) ORDERED FORWARD)

So, short answer - No need to rewrite, unless you spend a long time trying to read them each time you maintain them?

Meff
+2  A: 

It also depends on whether you are just doing inner joins this way or outer joins as well. For instance, the MS SQL Server syntax for outer joins in the WHERE clause (=* and *=) can give different results than the OUTER JOIN syntax and is no longer supported (http://msdn.microsoft.com/en-us/library/ms178653(SQL.90).aspx) in SQL Server 2005.

Sean Carpenter
+17  A: 

Filtering joins solely using WHERE can be extremely inefficient in some common scenarios. For example:

SELECT * FROM people p, companies c 
    WHERE p.companyID = c.id AND p.firstName = 'Daniel'

Most databases will execute this query quite literally, first taking the Cartesian product of the people and companies tables and then filtering by those which have matching companyID and id fields. While the fully-unconstrained product does not exist anywhere but in memory and then only for a moment, its calculation does take some time.

A better approach is to group the constraints with the JOINs where relevant. This is not only subjectively easier to read but also far more efficient. Thusly:

SELECT * FROM people p JOIN companies c ON p.companyID = c.id
    WHERE p.firstName = 'Daniel'

It's a little longer, but the database is able to look at the ON clause and use it to compute the fully-constrained JOIN directly, rather than starting with everything and then limiting down. This is faster to compute (especially with large data sets and/or many-table joins) and requires less memory.

I change every query I see which uses the "comma JOIN" syntax. In my opinion, the only purpose for its existence is conciseness. Considering the performance impact, I don't think this is a compelling reason.

Daniel Spiewak
Can you provide examples of database systems that do Cartesian products on the old join syntax? The old syntax is common enough that I think most DBMS optimizers would know not to do this.
crosstalk
I also adhere for what Michael wrote. Definitely there is a need for [citation needed] inside StackOverflow.
Camilo Díaz
Ask your DBMS to explain both statements to you. A good system will have the old SQL and the SQL 1992 join syntax perform equivalent operations.
dlamblin
You say "most databases" generate a "cartesian product". Not so. Many relational databases (DB2, Oracle, Sybase) predate the SQL-92 JOIN syntax. Before that, all of the join predicates were in the WHERE clause. A missing join predicate is what causes cartesian products. Most databases choose efficient access paths, and only choose a "Cartesian product" if that path has the lowest computed cost (or if it was the only access path available, or if the statement were hinted). The big advantage of the SQL-92 JOIN syntax is that it's more obvious that a join predicate is missing from a statement.
spencer7593
+3  A: 

In general:

Use the JOIN keyword to link (ie. "join") primary keys and foreign keys.

Use the WHERE clause to limit your result set to only the records you are interested in.

ilitirit
+2  A: 

The one problem that can arise is when you try to mix the old "comma-style" join with SQL-92 joins in the same query, for example if you need one inner join and another outer join.

SELECT *
FROM table1 AS a, table2 AS b
 LEFT OUTER JOIN table3 AS c ON a.column1 = c.column1
WHERE a.column2 = b.column2;

The problem is that recent SQL standards say that the JOIN is evaluated before the comma-join. So the reference to "a" in the ON clause gives an error, because the correlation name hasn't been defined yet as that ON clause is being evaluated. This is a very confusing error to get.

The solution is to not mix the two styles of joins. You can continue to use comma-style in your old code, but if you write a new query, convert all the joins to SQL-92 style.

SELECT *
FROM table1 AS a
 INNER JOIN table2 AS b ON a.column2 = b.column2
 LEFT OUTER JOIN table3 AS c ON a.column1 = c.column1;
Bill Karwin
+2  A: 

Another thing to consider in the old join syntax is that is is very easy to get a cartesion join by accident since there is no on clause. If the Distinct keyword is in the query and it uses the old style joins, convert it to an ANSI standard join and see if you still need the distinct. If you are fixing accidental cartesion joins this way, you can improve performance tremendously by rewriting to specify the join and the join fields.

HLGEM