views:

67

answers:

4

I have two tables as follows:

tblCountry (countryID, countryCode)

tblProjectCountry(ProjectID, countryID)

The tblCountry table is a list of all countries with their codes and the tblProjectCountry table associates certain countries with certain projects. I need an SQL statement that gives me a list of the countries with their country code that do NOT have an associated record in the tblProjectCountry table. so far I got to here:

SELECT     tblCountry.countryID, tblCountry.countryCode
FROM         tblProjectCountry INNER JOIN
                      tblCountry ON tblProjectCountry.countryID = tblCountry.countryID
WHERE     (SELECT     COUNT(ProjectID)
                         FROM         tblProjectCountry 
                         WHERE     (ProjectID = 1) AND (countryID = tblCountry.countryID)) = 0

The above statement parses as correct but doesn't give the exact result I'm looking for. Can anyone help?

A: 

SELECT ... WHERE ID NOT IN (SELECT ... )

Axarydax
apparently, you should never use IN In SQL To JOIN with another tableaccording to http://sqlservercode.blogspot.com/2007/04/you-should-never-use-in-in-sql-to-join.html
rohancragg
this is ridiculous, I shouldn't use some language feature because I might do a logical error?!
Axarydax
You should never do anything because I saw a blog post saying you shouldn't.
erikkallen
+3  A: 

Does this work?

SELECT countryID, countryCode 
  FROM tblCountry 
  WHERE countryID NOT IN ( SELECT countryID FROM tblProjectCountry )
tim_yates
This one worked like a charm, perfect thanks!
William Calleja
Although it works, I would say that this is not the most correct answer. Apparently, you should never use IN In SQL To JOIN with another tableaccording to http://sqlservercode.blogspot.com/2007/04/you-should-never-use-in-in-sql-to-join.html My answer shows how to use EXISTS instead.
rohancragg
+1  A: 

There are, at least, two ways to find unassociated records.

1. Using LEFT JOIN

SELECT DISTINCT -- each country only once
  tblCountry.countryID,
  tblCountry.tblCountry 
FROM
  tblCountry 
  LEFT JOIN
    tblProjectCountry
  ON
    tblProjectCountry.countryID = tblCountry.countryID
WHERE
  tblProjectCountry.ProjectID IS NULL -- get only records with no pair in projects table
ORDER BY
  tblCountry.countryID

As erikkallen mentioned this could perform not well.

2. Using NOT EXISTS

Various version of using NOT EXISTS or IN were suggested by rohancragg and others:

SELECT
  tblCountry.countryID,
  tblCountry.tblCountry 
FROM
  tblCountry 
WHERE
  -- get only records with no pair in projects table
  NOT EXISTS (SELECT TOP 1 1 FROM tblProjectCountry WHERE tblProjectCountry.countryID = tblCountry.countryID) 
ORDER BY
  tblCountry.countryID

Depends on your DBMS and size of countries and projects tables both version could perform better.

In my test on MS SQL 2005 there was no significant difference between first and second query for table with ~250 countries and ~5000 projects. However on table with over 3M projects second version (using NOT EXISTS) performed much, much better.

So like always, it's worth to check both versions.

Grzegorz Gierlik
SQL server will not recognize this as an anti-join so it will be forced to do a left join + filter, which may or may not be better, but probably is a lot worse.
erikkallen
True. I've checked that on MS SQL 2005 with table with 253 countries JOIN-ed to tables with ~5k rows and over 3M rows. In case of table with over 3M rows `LEFT JOIN` was much slower. However on table with 5k rows the cost was similar to version with `NOT EXISTS (SELECT TOP 1 1 FROM...)` suggested by [@rohancragg](http://stackoverflow.com/questions/2490839/how-can-i-make-an-sql-statement-that-finds-unassociated-records/2490911#2490911).
Grzegorz Gierlik
+3  A: 

Another alternative:

SELECT outerTbl.countryID, outerTbl.countryCode 
    FROM tblCountry AS outerTbl
    WHERE NOT EXISTS 
        (
            SELECT countryID FROM tblProjectCountry WHERE countryID = outerTbl.countryID
        )

This uses what's called a correlated subquery

Note that I also make use of the EXISTS keyword (see also)

On SQL Server, NOT EXISTS is generally thought to be more performant. On other RDMS's your mileage may vary.

rohancragg
Inner `SELECT countryID FROM ...` could be replaced with `SELECT TOP 1 1 FROM...`. The proof of existence is good enough.
Grzegorz Gierlik
thanks, I never thought of that, although I always thought it was pointless to select something that is never 'used'
rohancragg