views:

1027

answers:

5

On Query 1 a full table scan is being performed even though the id is an indexed column. Query 2 achieves the same result but much faster. If Query 1 is run returning an indexed column then it returns quickly but if non-indexed columns are returned or the entire row is then the query takes longer.

In Query 3 it runs fast but the column 'code' is a VARCHAR2(10) in stead of a NUMBER(12) and is indexed the same way as 'id'.

Why does Query 1 not pick up that it should use the index? Is there something that should be changed to allow indexed number columns to perform quicker?

[Query 1]

select a1.*
from people a1
where a1.id like '119%' 
and rownum < 5

Explain Plan
SELECT STATEMENT ALL_ROWS
Cost: 67 Bytes: 2,592 Cardinality: 4 2 COUNT STOPKEY     1 TABLE ACCESS FULL TABLE people
     Cost: 67 Bytes: 3,240 Cardinality: 5

[Query 2]

select a1.*
from people a1, people a2
where a1.id = a2.id
and a2.id like '119%' 
and rownum < 5

Explain Plan
SELECT STATEMENT ALL_ROWS
Cost: 11 Bytes: 2,620 Cardinality: 4
5 COUNT STOPKEY     4 TABLE ACCESS BY INDEX ROWID TABLE people
    Cost: 3 Bytes: 648 Cardinality: 1         3 NESTED LOOPS
        Cost: 11 Bytes: 2,620 Cardinality: 4
            1 INDEX FAST FULL SCAN INDEX people_IDX3
            Cost: 2 Bytes: 54,796 Cardinality: 7,828
            2 INDEX RANGE SCAN INDEX people_IDX3
            Cost: 2 Cardinality: 1

[Query 3]

select a1.*
from people a1
where a1.code like '119%' 
and rownum < 5

Explain Plan
SELECT STATEMENT ALL_ROWS
Cost: 6 Bytes: 1,296 Cardinality: 2
   3 COUNT STOPKEY
      2 TABLE ACCESS BY INDEX ROWID TABLE people
      Cost: 6 Bytes: 1,296 Cardinality: 2
         1 INDEX RANGE SCAN INDEX people_IDX4
         Cost: 3 Cardinality: 2

A: 

Try placing a hint in one of your queries to force it to use the desired index and then check your plan: it could be that (due to skewing or whatever) the optimzer does take the index into account, but decides against using it because of the perceived cost.

davek
Since OP does not understand indexes, I would not recommend using any hints.
We tried using a hint which did force the index to be used and resulted in a quick result however we are not going to be able to use hints from our frontend. Query 3 shows that when the same thing is performed on a VARCHAR2 column runs great without hints. I would like to find a solution that doesn't require hints. A change to the table would be possible (except for changing the column type).
James Collins
+1  A: 

Optimizer decided that it's faster to do a table scan, most probably due to low number of actual records.

Also, you should know that non-exact matching is always way worse than exact. If your where was "a1.id='123456'", it would most probably use index. But then again, even index takes two reads (first find a record in the index, then read the block from table) and for very small tables it could decide for table scan.

The table has 200k rows in it.
James Collins
Because of the ROWNUM predicate, the query won't necessarily full scan the entire table. it will stop when it finds the appropriate number of rows.
Gary
Explain plan on query 1 shows that there is a full table scan with the rownum.
James Collins
A: 

The LIKE keyword tells SQL that you are doing a regular expression match. You should never use regular expressions in SQL or in any programming library until you have checked the string functions available to see if the query could be expressed simply with them. In this case, you could change this to an equals condition by only comparing the substring consisting of the first 3 characters of the code. In Oracle, this would look like:

SELECT *
FROM people
WHERE SUBSTR(code,1,3) = '119'
Michael Dillon
I think I'm right in saying that function based indexes in oracle (such as using upper-casing, substrings and the like) are only available in the enterprise version: which may not always be an option.
davek
Your query using the substr performs a full table scan. We are using like not regexp_like.
James Collins
Function based indexes are suported in standard edition since 10g if i'm not mistaken. substr + function index is probably the best solution.
LIKE is NOT a regular expression function. The closes is REGEXP_INSTR.And a substring is no more likely to use an index than a LIKE.
Gary
`LIKE` is not a regular expression matching, it is plain [pattern matching][1]. You confused it with [REGEXP_LIKE][2][1]: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/conditions007.htm#BABGDGIH[2]: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/conditions007.htm#i1048942
Sergey Stadnik
I did not confuse LIKE with REGEXP_LIKE. In any SQL, LIKE does a regular expression match, it just uses a simple and limited regular expression engine unlike the newer REGEXP_LIKE which uses a full regular expression engine. Both forms of LIKE do pattern matching. Both forms of LIKE use regular expressions. But the newer form has a more complete implementation of regular expressions than plain LIKE. The bottom line is that pattern matching is more computationally expensive and more complex when it comes to execution plan analysis.
Michael Dillon
@Michael Can you back that up with examples? I tried 100000 substr() operations against 100000 equivalent LIKEs and LIKE was ~30% faster.
jva
+2  A: 

LIKE is a string function, so a numeric index can't be used as easily. In numeric index, you'll have 119,120,130,..,1191,1192,1193...,11921,11922... etc. That is all the rows starting with the '119' won't be in the same place, so the whole index has to be read (hence the FAST FULL SCAN). In a character based index they will be together (eg '119','1191','11911','120',...) so a better RANGE SCAN can be used.

If you were looking for id values in a particular range (eg 119000 to 119999) then specify that as the predicate (id between 119000 and 119999).

Gary
+7  A: 

LIKE pattern-matching condition expects to see character types as both left-side and right-side operands. When it encounters a NUMBER, it implicitly converts it to char. Your Query 1 is basically silently rewritten to this:

SELECT a1.*
  FROM people a1
 WHERE TO_CHAR(a1.id) LIKE '119%'
   AND ROWNUM < 5

That happens in your case, and that is bad for 2 reasons:

  1. The conversion is executed for every row, which is slow;
  2. Because of a function (though implicit) in a WHERE predicate, Oracle is unable to use the index on A1.ID column.

To get around it, you need to do one of the following:

  1. Create a function-based index on A1.ID column:

    CREATE INDEX people_idx5 ON people (TO_CHAR(id));

  2. If you need to match records on first 3 characters of ID column, create another column of type NUMBER containing just these 3 characters and use a plain = operator on it.

  3. Create a separate column ID_CHAR of type VARCHAR2 and fill it with TO_CHAR(id). Index it and use instead of ID in your WHERE condition.

    Of course if you choose to create an additional column based on existing ID column, you need to keep those 2 synchronized.You can do that in batch as a single UPDATE, or in an ON-UPDATE trigger, or add that column to the appropriate INSERT and UPDATE statements in your code.

Sergey Stadnik
It might also be possible to rewrite the predicate as ID BETWEEN 1190000 and 1199999, if the values are all of the same order of magnitude. Or if they're not then ID = 119 OR ID BETWEEN 1190 and 1199 etc..
David Aldridge
Thank you this was a very complete explanation and I appreciated your listed solutions. The function based index seems to work nicely.
James Collins