tags:

views:

149

answers:

4

Like the title says, if anyone has the answer I would like to know. I've been googling but couldn't find a straight answer.

Example:

This works

SELECT COUNT(*) FROM Table1 TB1, Table2 TB2
WHERE TB1.Field1 = TB2.Table2 

This seems to take hours

SELECT COUNT(*) FROM Table1 TB1, Table2 TB2
WHERE TB1.Field1 <> TB2.Table2 
+1  A: 

Probably, the second query processes way more rows than the first one.

MvanGeest
+12  A: 

Because they are different SQL sentences. In the first one, you are joining two tables using Field1 and Table2 fields. Probably returning a few records.

In the second one, your query is probably returning a lot of records, since you are doing a cross join, and a lot of rows will satisfy your Field1 <> Table2 condition.

A very simplified example

Table1

Field1
------
1
2
5
9

Table2

Table2
------
3
4
5
6
9

Query1 will return 2 since only 5 and 9 are common.

Query2 will return 18 since a lot of rows from cross join will count.

If you have table with a lot of records, it will take a while to process your second query.

Pablo Santa Cruz
Interesting! Your post resembles the one from Joachim Sauer. Your post and his combined is a good combination.
jovany
+1  A: 

(Thinking back to a similar question)

Are you trying to count the rows in Table1 for which there is no matching record in Table2?

If so you could use this

SELECT COUNT(*) FROM Table1 TB1
WHERE NOT EXISTS
     (SELECT * FROM Table2 TB2 
      WHERE TB1.Field1 = TB2.Field2 )

or this for example

SELECT COUNT(*)
FROM
(
SELECT Field1 FROM Table1 
MINUS
SELECT Field2 FROM Table2
) T
Martin Smith
+10  A: 

It's important to realize that SQL is a declarative language and not an imperative one. You describe what conditions you want your data to fit and not how those comparisons should be executed. It's the job of the database to find the fastest way to give you an answer (a task taken over by the query optimizer). This means that a seemingly small change in your query can result in a wildly different query plan, which in turn results in a wildly different runtime behaviour.

The = comparison can be converted to and optimized the same way as a simple join on the two fields. This means that normal indices can be used to execute the query very fast, probably without reading the actual data and using only the indices instead.

A <> comparison on the other hand requires a full cartesian product to be calculated and checked for the condition, usually (there might be a way to optimize this with the correct index, but usually an index won't help here). It will also usually return a lot more results, which adds to the execution time.

Joachim Sauer
Thanks for your explanation. It's a lot clearer now.Your post in combination with Pablo's makes a lot of sense.
jovany