views:

103

answers:

6
SELECT *
  FROM openorders_tracking
  WHERE id NOT IN (SELECT tracking_id FROM openorders_commission)

SELECT *
  FROM openorders_tracking
LEFT JOIN openorders_commission
  ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL

I'm wondering both specifically for this query, and in general if there is a significant difference.

I don't have any data yet, nor indexes yet. I guess it's more in general if "NOT IN" is better than a JOIN or vice-versa.

For those who like EXPLAIN, here is the results for now (again, no indexes yet besides the primaries):

mysql> explain SELECT * FROM openorders_tracking WHERE id NOT IN (SELECT trackin
g_id FROM openorders_commission);
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
| id | select_type        | table                 | type   | possible_keys | key
  | key_len | ref  | rows | Extra               |
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
|  1 | PRIMARY            | openorders_tracking   | ALL    | NULL          | NUL
L | NULL    | NULL |  341 | Using where         |
|  2 | DEPENDENT SUBQUERY | openorders_commission | system | NULL          | NUL
L | NULL    | NULL |    0 | const row not found |
+----+--------------------+-----------------------+--------+---------------+----
--+---------+------+------+---------------------+
2 rows in set (0.00 sec)

mysql> explain SELECT * FROM openorders_tracking LEFT JOIN openorders_commission
 ON openorders_tracking.id=openorders_commission.tracking_id WHERE openorders_co
mmission.id IS NULL;
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
| id | select_type | table                 | type   | possible_keys | key  | key
_len | ref  | rows | Extra               |
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
|  1 | SIMPLE      | openorders_commission | system | PRIMARY       | NULL | NUL
L    | NULL |    0 | const row not found |
|  1 | SIMPLE      | openorders_tracking   | ALL    | NULL          | NULL | NUL
L    | NULL |  341 |                     |
+----+-------------+-----------------------+--------+---------------+------+----
-----+------+------+---------------------+
2 rows in set (0.00 sec)
+3  A: 

It's an odd thing to ask when you have the two queries and the database in front of you. Try running them, and use EXPLAIN to see the execution plans.

My guess is that MySQL will optimize them both into the same execution plan but that might depending column types and your indexing scheme.

Larry Lustig
I don't have data nor indexes just yet. I was just wondering in general. See my edits to the question.
nute
Yes, run them both and see which is faster. I would say the first is easier to read.
Patrick
Also, use a representative data set. Many performance issues only surface when large data sets are used.
Martin
A: 

I was told to limit the number of SELECTs per query, so based on that I would say the JOIN is the most efficient.

Monkieboy
-1 I recommend basing your query-design choices on execution plans and observed performance, rather than [what you've been told](http://en.wikipedia.org/wiki/Cargo_cult_programming).
djacobson
Actually you are partly correct, but everyone should take counsel from people that are experts in their area. I was told not to use too many SELECT statement per query by a DBA, I am a developer not a DBA so I would always trust their subject-matter expertise over my basic understanding. If there is a technical reason why multiple SELECTs would not add an over head you should state why then mark me down for poorly explaining my answer, rather than make assumptions that I cannot think for myself and that I am some kind of zombie that just regurgitates what others say.
Monkieboy
@djacobson - Time to get down off your high horse. Your response has not added to the question or help with why this answer is good or not.
Bronumski
@monkieboy - I think it depends on the query as to how the database engine will optimize it.
Bronumski
A: 

I would use the top query. It's easier to understand, IMHO and its use of "select *" will only select the columns from the table you're after.

However, it's very possible they will have the same execution plan.

Matthew PK
A: 
SELECT *
  FROM openorders_tracking
LEFT JOIN openorders_commission
  ON openorders_tracking.id=openorders_commission.tracking_id
WHERE openorders_commission.id IS NULL

The above said query will be more efficient. They may produce the same performance this time but once you do proper indexing, this query will help you always. So try to index your tables and test both the queries with worst case scenario (try to insert more and more records and you will see the difference)

gajendra.bang
A: 

I have always understood the NOT EXISTS variant of this type of query to be more efficient normally, as it only checks for the first occurence of the queried value in the lookup table - eg:

SELECT *
  FROM openorders_tracking t
  WHERE NOT EXISTS
  (SELECT NULL FROM openorders_commission c
   WHERE c.tracking_id = t.id)

As ever, check the actual performance of the query to determine which is faster.

Mark Bannister
+1  A: 

Attacking this from a purely software development approach I would say that this is premature optimization, what you should be striving for is readability. As to which query is more readable would be the call of you and your team. Whilst this doesn't answer the question, which I think should be answered (not by me but someone more qualified like a DBA), you should always think about what you are going to gain by optimizing.

Taken from Wikipedia (Program optimization)

When to optimize

Optimization can reduce readability and add code that is used only to improve the performance. This may complicate programs or systems, making them harder to maintain and debug. As a result, optimization or performance tuning is often performed at the end of the development stage.

Donald Knuth made the following two statements on optimization:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

(He also attributed the quote to Tony Hoare several years later, although this might have been an error as Hoare disclaims having coined the phrase.)

"In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"

"Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing.

An alternative approach is to design first, code from the design and then profile/benchmark the resulting code to see which parts should be optimized. A simple and elegant design is often easier to optimize at this stage, and profiling may reveal unexpected performance problems that would not have been addressed by premature optimization.

In practice, it is often necessary to keep performance goals in mind when first designing software, but the programmer balances the goals of design and optimization.

Bronumski