views:

95

answers:

4

This a follow-up to a previous question.

How can I optimize this query so that it does not perform a full table scan?

 SELECT Employee.name FROM Employee WHERE Employee.id <> 1000;

.

explain SELECT Employee.name FROM Employee WHERE Employee.id <> 1000;
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | Employee    | ALL  | PRIMARY       | NULL | NULL    | NULL | 5000 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+

(Empoyee.id is the primary key, in case that isn't clear.)

+4  A: 

Have a covering index for name and id, and it should be able to fulfill the query using the index. This might be faster, because there's a good chance the entire index will already be in memory, while a table scan is more likely to need to go to disk.

Because of the low (non-existent) selectivity of your where clause you may need to provide a hint to get the database to use your index. I'm a sql server guy, and so I'm not sure of the syntax needed in mysql to hint an index, or even if mysql is able to take advantage of a covering index in this manner.

That said, I doubt you can get much improvement: you're returning every row but one. You should expect that to need to scan the table.

Joel Coehoorn
+1: Thanks, this is something I knew about, but did not make the connection ... experience is king! Ok, I now saw your edit: I think it at least could help, having a sufficient intelligent DB-Engine. What you would get, is a "full-index-scan" instead of a "full-table-scan". Since an index of 2 columns is most of time smaller than the whole table, it will be faster (also when not in memory!).
Juergen
Returning all rows except one sounds like a disaster, it's going to do a table scan, or at the very least a full index scan which may be almost as bad (depending on the schema and data)
MarkR
@Mark: Why are all panicing, when they here about full-table-scan? Just think about it -- what must be returned? All the employee-names except one. You at least need to read this information -- and this information + one more column (the id, which is also needed for comparison) is readed. I would say, in the relational model there is no better solution possible. Off course you always could think of specialized models for this very special problem -- but would that be justified???
Juergen
+1  A: 

In traditional databases, you cant!

Of course, you could just omit all Employees with the given Id (when it is key or has an index) -- but normally you will still have the total majority of the table under your feet. So using an index might complicate things and thus a fts normally is the faster option.

When you have specialized databases, you could store the names of all employees adjacent to each other.

Edit: I now saw the other answer of Joel. Yes, this could be a way, since in fact your special index is now a specialized form of storing a part of the content. Good databases can just use the index content when it covers the columns needed -- this is rather nice. Of course, you will endup in a so called "full index scan" (but normally much faster as a full-table-scan).

Juergen
in fact, this will garner no improvement at all, as using the index will decrease performance, not increase it, since every record in table must be read from disk but one. and if the Query optimizer was so foolish as to use the index, (thankfully it's smarter than this) it would have to add additonal IO operations to read the index pages on top of all the IO reads which are necessary to do a complete table scan.
Charles Bretana
@Charles: You explained, what I wanted to say in less words. But there is the exception, Joel talked about -- and he is right, at least for some databases (not all might be able to find this optimization).
Juergen
A: 

There are a lot of things to try, it depends on how the database engine chooses to parse it, really. Some options:

select employee.name from employee where employee.id not in (1000);

You could also try a union with a less than and then a greater than.

But in the specific example you are giving (which may just be too simple for your real case) a table scan isn't necessarily a bad thing. If all the records have to be returned except one, using an index may in fact be slower.

Yishai
Besides being polite, giving a reason for a downvote helps further discussion. Just downvoting doesn't.
Yishai
@Yishai, Sorry, you're right... What I should say is:When the Database sees that it must read more than 20-30% of the pages holding the rows in the table, it recognizes that traversing an index (which requires one I/O Operation per level in the index), would be more expenseive, and it just does a table scan anyway
Charles Bretana
I thought I said that in my last paragraph. My answer kind of assumes that the question is intended to represent the specific technical challenge (not equals on an indexed field) rather than the specific example. Many times questions on SO are not the real situation in order to keep them simple.
Yishai
+1  A: 

Nothing you can do will increase performance. In this case the database must do a complete table scan, as you are asking for every record save one. Reading every page in an index on top of that would only reduce performance. Fortunately, even if you added an index, the database would be smart enough to ignore it...

EDIT to address @Juergens comment.
Juergen, you are right about a covering index, but there are conflicting effects here. Any use of an index in a scenario like this has bad effects in one sense... The query engine could have to perform one I/O Operation for each level in the index, for each row it needs to examine. If there are, say, 5 levels in the index, and 1M rows, that would be 5 Million I/O operations, compared to only 1M I/Os to do a complete table scan. This is why, in this scenario, most query optimizers would ignore any available index and do the table scan anyway. (unless you force it to use the index with a hint) The only mitigating factor is if EVERY attribute required by the query is in the index (covering index) and the number of index rows per page on disk is sufficiently smaller than the number of table rows per page to counteract the negative effect of having to traverse each level of the index for each row returned by the query.

Charles Bretana
You are right in one way. But still there are databases that are smart enough to fulfill a query from the index when the index covers all needed columns -- since this way saving the access of the speific database entry. A smart enough database engine could use an index of (id, name) to avoid the table access -- thus performing a full-index-scan only.
Juergen
Your concept of what requires an IO is wrong. It's not a separate IO operation per record. It's a separate IO operation per _page_. Since we get to design the index it's going to be 1:1 anyway, and odds are this index will be kept in memory to begin with, so there might not be any disk IO at all. At very least, the index is likely smaller than than the table (assuming there are more columns he hasn't told us about) and so fewer total pages to read.
Joel Coehoorn