views:

696

answers:

4

I have a table in Oracle 10 with following structure

Create Table Bookmarks(  
    BOOKMARKID NUMBER(10,0) NOT NULL PRIMARY KEY,
    URL VARCHAR2(4000 CHAR) NOT NULL UNIQUE
)

URL has a unique constraint set thus a unique index. There are about 1 million records in this table. I have to frequently check if a bookmark already exists in the table. I am issuing the following query

Select bookmarkid from Bookmarks where URL='<some url>'

The problem is that as the number of records have grown so has the performance gone down. Now It takes significantly longer to return the bookmark id specially when the query URL is long. In the Explain Plan the query does use Unique index on URL column. Any suggestions for improving response time?

+3  A: 

You would typically use a hash index for this. In mssql I would create a persisted computed column that did like CRC(url). Then when you want to check for existance, you look up WHERE crc('some url') = PersistedCrcColumn AND URL='some url'

You have to include the original check with the crc check since you can occassionally get CRC collisions.

EDIT - changing my description above from 'hash lookup' to 'hash index' to avoid confusion. Some dbs have hash indexes as a first class index, I don't believe oracle does (and I know mssql doesn't). If it is not supported intrinsically, the above approach is how you implement it manually.

+2  A: 

Calcuate MD4 index for your URL and assign it in a trigger :

:new.HASH := DBMS_CRYPTO.hash(UTL_RAW.cast_to_raw(:new.url), 1)

Create an index on this column and search by the hash value.

Don't forget to GRANT EXECUTE on DBMS_CRYPTO to the user owning the trigger.

MD4 is the fastest hashing algorithm, that's why it's widely used in places where cryptographical strength is not important.

Quassnoi
Is a trigger needed? Why not a function based index?
tuinstoel
@tuinstoel: UTL_RAW.cast_to_raw is not deterministic.
Quassnoi
Ahh, it should be deterministic, shouldn't it?
tuinstoel
It should, but it is not :(
Quassnoi
You can create a wrapper function myhash that wraps dbms_crypto and declare it be deterministic. It is an ugly solution/workaround, the trigger is better.
tuinstoel
I'd go with ORA_HASHhttp://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions107.htm#i1235081Not sure on the performance of the hashing algorithm, but avoiding the switch between PL/SQL and SQL would probably outweigh that.
Gary
`ORA_HASH` is only 2^32 bits long, collations are very probable.
Quassnoi
REPLACE('collations', 'collisions') in the comment above.
Quassnoi
+3  A: 

Make index that includes bookmarkid column also. Like this:

create IX on bookmarks (url, bookmarkid);
Dmitry
I'm voting up Dmitry's answer - in case it is not obvious, including the bookmarkid column in the index eliminates the need for Oracle to visit the table segment blocks to retrieve the information - only the index is accessed. If the BOOKMARKS table was relatively static then creating it as an IOT (indexed organized table) would be a somewhat similar approach.
dpbradley
+2  A: 

Check that the execution plan is using a UNIQUE scan on the index, not a full / fast full / range scan. You've probably got a lot of URLS starting with http://www. (and maybe a lot more of the leading values too), which isn't ideal. If you can, go with creating a non unique function based index on ORA_HASH(url) and add that to the queries. It will give a much smaller indexed value with a much better spread of values. If changing the queries is not possible, try recreating the index with REVERSE.

It won't help with the size of the indexed value, but may spread it better.

Gary