ansaurus

Question

Answer 1

A:

As far as I know, NOT IN can only be used for 1 field at a time. And the field has to be specified in between "WHERE" and "NOT IN".

(Edit:) Try using a NOT EXISTS:

SELECT a, b, c 
FROM t1 
WHERE NOT EXISTS 
   (SELECT * 
   FROM t2 
   WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)

In addition, an inner join on a, b, and c being equal should give you all non-DISTINCT tuples, while a LEFT JOIN with a WHERE IS NULL clause should give you the DISTINCT ones, as Charles mentioned below.

froadie 2010-02-11 18:00:12

Holy God, that would make a nightmare query. I only care about the values of a, b, or c depending on the values of the other two!

2010-02-11 18:03:15

How about using a join? Inner joining on all 3 fields should return all non-distinct tuples

froadie 2010-02-11 18:05:06

Or how about using exists? SELECT a, b, c FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c). Not sure if this is exactly correct, I haven't had much experience with exists

froadie 2010-02-11 18:12:19

Answer 2

A:

SELECT DISTINCT t1.* FROM t1 LEFT JOIN t2 ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) WHERE t2.a IS NULL

Charles 2010-02-11 18:20:02

This is a very bad idea, if the tables are big that left join will be painfully slow. You're doing alot of work, reading to both tables fully creating a left join result set, etc., which is simply unnecessary.

wich 2010-02-11 18:35:02

Indeed. Not performance wise.

Charles 2010-02-11 19:53:26

Answer 3

+2 A:

You should use not exists:

SELECT DISTINCT a, b, c FROM t1 WHERE NOT EXISTS (SELECT NULL FROM t2 WHERE t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c)

Using NOT IN is not the best method to do this, even if you check only one key. The reason is that if you use NOT EXISTS the DBMS will only have to check indices if indices exist for the needed columns, where as for NOT IN it will have to read the actual data and create a full result set that subsequently needs to be checked.

Using a LEFT JOIN and then checking for NULL is also a bad idea, it will be painfully slow when the tables are big since the query needs to make the whole join, reading both tables fully and subsequently throw away a lot of it. Also, if the columns allow for NULL values checking for NULL will report false positives.

wich 2010-02-11 18:30:41

You know what? I ran this query, and it was taking a long time ( longer than 10 minutes ), so I killed it. Then I set up temp tables for the two tables and inserted distinct information. Then I ran the query against the temp tables. Took one minute and 4 seconds. Why couldn't mysql optimize this query thusly?

2010-02-11 19:41:02

To be honest, mysql is rather stupid and slow in some regards. Oracle, MS SQL and PostgreSQL do a lot better in many regards. Of course it will also be a lot faster if you add indices to the columns of the tables, if you don't have them already. although that will go at the cost of insert time since the indices need to be updated each time, it's a matter of how many times you do each operation and which is the more time critical.

wich 2010-02-12 07:33:25

Answer 4

A:

Need to add a column list after the WHERE clause and REMOVE the alias.

I tested this with a similar table and it is working.

SELECT DISTINCT a, b, c 
FROM t1 WHERE (a,b,c)
NOT IN (SELECT DISTINCT a,b,c FROM t2)

Using the mysql world db:

-- dont include city 1, 2
SELECT DISTINCT id, name FROM city 
WHERE (id, name) 
NOT IN (SELECT id, name FROM city  WHERE ID IN (1,2))

Yada 2010-02-11 18:36:13

It would be better to use NOT EXISTS, NOT IN will force a result set to be made for the sub query by reading the whole table, when you use NOT EXISTS no result set needs to be created for the sub query and if the columns are indexed, the NOT EXISTS will only read the indices.

wich 2010-02-11 18:46:27

Are you sure about that? If the columns are not indexed NOT EXISTS is as slow NOT IN.

Yada 2010-02-11 19:12:09

Answer 5

A:

Well, I'm going to answer my own question, in spite of all the great advice others gave.

Here's the proper syntax for what I was trying to do.

SELECT DISTINCT a, b, c FROM t1 WHERE (a,b,c) NOT IN ( SELECT DISTINCT a,b,c FROM t2 )

Can't vouch for the efficiency of it, but the broader questions I was implicitly putting was "How do I express this thought in SQL", not "How do I get a particular result set". I know that's unfair to everyone who took a stab, sorry!

2010-02-12 03:09:59

Depending on the definition of columns a, b and c of t2 this could be wrong! If they allow for NULL values then the result for the NOT IN will always be unknown for any such value. NOT EXISTS *is* the right way to express this, NOT EXISTS was created for this.

wich 2010-02-12 07:40:25

Awesome! Thanks.

2010-02-12 14:50:32

ansaurus

tags:

views:

answers:

mysql SELECT NOT IN () -- disjoint set?

related questions