views:

1419

answers:

5

I need to design a Key/value table in my database and I'm looking for guidance on the best way to do this. Basically, I need to be able to associate values to a dynamic set of named properties and apply them to an external key.

The operations I need to be able to support are:

  • Apply a key/value pair to a group of items
  • Enumerate all of the currently-active keys
  • Determine all of the items that have a value for a given key
  • Determine all of the items where the value associated with a given key matches some criteria.

It seems that the simplest way to do this is to define a table:

CREATE TABLE KeyValue (
  id    int,
  Key   varchar...,
  Value varchar...
);

It seems that I am likely to be duplicating a lot of data in the Key column because I any given key is likely to be defined for a large number of documents. Replacing the Key varchar with an integer lookup into another table seems to alleviate this problem (and make it significantly more efficient to enumerate all of the active keys), but sticks me with the problem of maintaining that lookup table (upserting into it whenever I want to define a property and potentially removing the entry any time a key/value is cleared).

What's the best way to do this?

A: 

An option that may be worth exploring is digesting the key using SHA1 or MD5 before inserting it into the table.

That will allow you to get rid of the lookup table, but you will not be able to iterate through the keys cause it only goes one way.

Sam Saffron
A: 

Create updatable views! . Also check this for an example.

Learning
A: 

It seems to me like you might have a couple design choices.

Choice 1: A two table design you hinted at in your answer

Keys (
 id int not null auto_increment
 key string/int
)
values (
 id int not null auto_increment
 key_id int
 value string/varchar/int
)

Choice 2: perhaps as sambo99 pointed out you could modify this:

keys (
 id int not null auto_increment
 key string/int
 hash_code int -- this would be computed by the inserting code, so that lookups would effectively have the id, and you can look them up directly
)

values (
 id int not null auto_increment -- this column might be nice since your hash_codes might colide, and this will make deletes/updates easier
 key_id int -- this column becomes optional
 hash_code int
 value string/varchar/int...
)

--

Nathan Feger
chances of SHA1 colliding are pretty much 0 in reality, you need to be super evil to cause collisions, if you are super worried you could use SHA256
Sam Saffron
I've personally seen a hash collision (TTH) in DC++, so it's not impossible.
erikkallen
+2  A: 

Don't optimize this unless you have to. What is the average length of a key? Will this table be so big it won't all fit into your server's memory if you implement it the naive way? I'd suggest implementing it the simplest way, measure the performance, and then re-implement only if performance is a problem.

If performance is a problem, then using an integer key and a separate table is probably the way to go (JOINS on integer columns are typically faster than JOINS using variable-length-string columns). But the first rule of optimizing is MEASURE FIRST-- make sure your supposedly-optimized code actually does make thing run faster.

gavinandresen
+1 simplicity. Unless the performance impact is obvious and severe, always go with what is easiest to work with, then test and optimize as necessary.
Rex M
+3  A: 

You are employing a database model called Entity-Attribute-Value. This is a common way to store key/value pairs in a relational database, but it has a number of weaknesses with respect to database normalization and efficiency.

Yes, the table design you showed is the most common way to do it. In this design, every attribute of every entity gets a distinct row in your KeyValue table.

Apply a key/value pair to a group of items: You need to add one row for each item in the group.

INSERT INTO KeyValue (id, key, value) VALUES (101, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (102, 'color', 'green');
INSERT INTO KeyValue (id, key, value) VALUES (103, 'color', 'green');

You may also prepare the INSERT statement with parameters and run through a number of item id's in a loop, or whatever.

Enumerate all of the currently-active keys:

SELECT DISTINCT Key FROM KeyValue;

Determine all of the items that have a value for a given key:

SELECT id FROM KeyValue WHERE Key = 'color';

Determine all of the items where the value associated with a given key matches some criteria:

SELECT id FROM KeyValue WHERE Value = 'green';

Some of the problems with Entity-Attribute-Value are:

  • No way to make sure keys are spelled the same for all items
  • No way to make some keys mandatory for all items (i.e. NOT NULL in a conventional table design).
  • All keys must use VARCHAR for the value; can't store different data types per key.
  • No way to use referential integrity; can't make a FOREIGN KEY that applies to values of some keys and not others.

Basically, Entity-Attribute-Value is not a normalized database design.

Bill Karwin