tags:

views:

41

answers:

3

What you do when you need to maintain a table with unique values when you can't use UNIQUE constraint?

For example, I use MySQL and want to map my urls to ids. So I create a table:

CREATE TABLE url (id INTEGER PRIMARY KEY AUTO_INCREMENT, url VARCHAR(2048));

The problem is that mysql doesn't allow unique field bigger than 1000 bytes. How in general do insert only if not exist in sql atomically?

+1  A: 

You could use a not exist condition:

insert  YourTable
        (url)
values  ('blah blah blah')
where   not exists
        (
        select  *
        from    YourTable
        where   url = 'blah blah blah'
        )
Andomar
Are you sure this is atomic, i.e. if the table is large and two such queries execute at once, is it certain that two rows won't get inserted? (This is a genuine question, I'm not saying your code is wrong, I just don't know the answer..)
Adrian Smith
@Adrian Smith: Good point, and you're probably right. This would require serializable isolation level to be reliable (range lock), and MySQL doesn't support that.
Andomar
+1  A: 

In my opinion the best way to handle it is to write a trigger. The trigger is going to check each value in the table to see whether they are equal and if yes, to raise an error. However, I don't think an URL will go beyond 1000 characters but if it does in your case, you should write a trigger to handle the uniqueness.

Ranhiru Cooray
+5  A: 

You could create an extra field which would be the hash of a url e.g. md5, and make that hash field unique. You can certainly be sure that the URL is unique then, and with almost 100% certainty you can insert a new URL if it isn't already there.

It is tempting to create a table lock, however creating a table lock will implicitly commit the transaction you are working on: http://www.databasesandlife.com/mysql-lock-tables-does-an-implicit-commit/

You could create a single-row table e.g. name mutex, type=InnoDB, insert a row into it, and do a select for update on that row to create a lock which is compatible with transactions. It's nasty but that's the way I do table locks in MySQL in my applications :(

Adrian Smith
+1 for the hash idea. Even if you need to enforce utter uniqueness somehow, using a hash here will help narrow down the rows to a usable handful for a longer, slower string comparison.
Matt Gibson
Hash will definitely enforce uniqueness (the same URL goes to the same hash so would generate a unique constraint violation on the hash) but it might be that two different URLs can't be inserted if they hash to the same value. But I'm sure with a full-length hash like md5 that's really really unlikely, I mean after all `git` uses hashes to identify all commits, if hash-collisions were likely that wouldn't work.
Adrian Smith