views:

79

answers:

4

(Using Oracle)

I have a table with key/value pairs like this:

create table MESSAGE_INDEX
(
  KEY               VARCHAR2(256) not null,
  VALUE             VARCHAR2(4000) not null,
  MESSAGE_ID        NUMBER not null
)

I now want to find all the messages where key = 'someKey' and value is 'val1', 'val2' or 'val3' - OR value is null in which case there will be no entry in the table at all. This is to save space; there would be a large number of keys with null values if I stored them all.

I think this works:

SELECT message_id
FROM message_index idx
WHERE ((key = 'someKey' AND value IN ('val1', 'val2', 'val3'))
      OR NOT EXISTS (SELECT 1 FROM message_index WHERE key = 'someKey'
      AND idx.message_id = message_id))

But is is extremely slow. Takes 8 seconds with 700K records in message_index and there will be many more records and more search criteria when moving outside of my test environment.

Primary key is key, value, message_id:

  add constraint PK_KEY_VALUE primary key (KEY, VALUE, MESSAGE_ID)

And I added another index for message_id, to speed up searching for missing keys:

create index IDX_MESSAGE_ID on MESSAGE_INDEX (MESSAGE_ID)

I will be doing several of these key/value lookups in every search, not just one as shown above. So far I am doing them nested, where output id's of one level is the input to the next. E.g.:

SELECT message_id from message_index
WHERE (key/value compare)
AND message_id IN
  (
    SELECT ... and so on
  )

What can I do to speed this up?

A: 

If you have a key that all messages are guaranteed to have:

SELECT  message_id
FROM    message_index mi
WHERE   mi.key = 'GuaranteedKey'
        AND mi.message_id IN
        (
        SELECT  message_id
        FROM    message_index mk
        WHERE   mk.key = 'someKey'
                AND mk.value IN (1, 2, 3)
        )
UNION ALL
SELECT  message_id
FROM    message_index mi
WHERE   mi.key = 'GuaranteedKey'
        AND mi.message_id NOT IN
        (
        SELECT  message_id
        FROM    message_index mk
        WHERE   mk.key = 'someKey'
        )

If you don't:

WITH    mi AS
        (
        SELECT  DISTINCT message_id
        FROM    message_index
        )
SELECT  message_id
FROM    mi
WHERE   mi.message_id IN
        (
        SELECT  message_id
        FROM    message_index mk
        WHERE   mk.key = 'someKey'
                AND mk.value IN (1, 2, 3)
        )
UNION ALL
SELECT  message_id
FROM    mi
WHERE   mi.message_id NOT IN
        (
        SELECT  message_id
        FROM    message_index mk
        WHERE   mk.key = 'someKey'
        )
Quassnoi
I don't see how your second sample would include records where key is not present at all. mi.Key = 'GuarenteedKey' (did you mean NonGuarenteedKey?) will never match.
@user: sorry, just forgot to remove the condition.
Quassnoi
Could you show me how to combine multiple criteria? Like also having 'someKey2' matching value (4, 5, 6).
@user: just add another `IN` predicate into the `WHERE` clause.
Quassnoi
A: 

To speed this up you convert subselects into joins, so your query would become something like this:

SELECT idx.message_id
FROM message_index idx
LEFT JOIN message_index idx2 ON idx2.message_id = idx.message_id AND idx2.key = 'someKey'
WHERE (idx.key = 'someKey' AND idx.value IN ('val1', 'val2', 'val3'))
   OR idx2.message_id IS NULL
Imre L
In `Oracle`, `NOT IN` and `LEFT JOIN / IS NULL` perform the same: http://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/
Quassnoi
Thanks for the link.
Imre L
A: 

Hi user72185,

I'm not sure your second filter is what you are looking for. Basically the subquery:

(SELECT 1
   FROM message_index
  WHERE key = 'someKey'
    AND idx.message_id = message_id)

won't contain rows only if there is no key 'someKey' for that message_id in the table.

If this is really what you want, and since all columns are NOT NULL, you could rewrite the query with a NOT IN (that will probably be optimized into an HASH ANTI-JOIN):

SELECT message_id
  FROM message_index idx
 WHERE (key = 'someKey' AND VALUE IN ('val1', 'val2', 'val3'))
UNION ALL
SELECT message_id
  FROM message_index    
 WHERE message_id NOT IN (SELECT message_id 
                            FROM message_index 
                           WHERE key = 'someKey');
Vincent Malgrat
+1  A: 

"What can I do to speed this up?"

Use a normalized data model rather than a key-value store. Reconstructing the (especially optional) attributes of a message is going to be a continual performance bugbear.

Gary
New keys will be added while the system is in production, so normalizing would mean adding new columns to a large table at runtime. But maybe this isn't a problem? I'm beginning to consider this since it is so much simpler.
ALTER TABLE ... ADD COLUMN is pretty trivial if the existing rows are 'happy' with the new value being null. 11g can even cope with assigning a default to the existing columns without a hassle.Only takes a few seconds, though would need an exclusive lock for that time.If you need to be running 24x7 99.999%, then you may need some of the enterprise options for reorganisation.
Gary