ansaurus

Question

SQL Server unique set of columns without using a constraint - getting bit by the 900 byte limit

Answer 1

A:

255 seems pretty arbitrary. With that in mind, any chance that nvarchar(150)'s would be adequate? Or some combination that results in <= 900 bytes?

Joel Coehoorn 2008-10-13 18:56:57

I am stuck with the 255 character limit.

Michelle 2008-10-13 19:03:05

Answer 2

+2 A:

If I understand you correctly, you could use a trigger and on the Insert or Update check to see if the combination of the values is unique prior to making the change. You could also do the same via a stored procedure.

mattruma 2008-10-13 18:57:54

Yes. Just count the number of rows with the triple and make sure that it isn't greater than 1. You may want non-unique indices on each of the three columns to make this faster.

tvanfosson 2008-10-13 19:38:03

Answer 3

A:

You could hash the 3 columns, using MD5 or SHA1, and create a unqiue constraint on the hashed value.

How easy it is to implement depends on where you INSERTs/UPDATEs come from.

If they only come from your application, then it should be relatively easy to implement.

If they come from multiple sources you can look into implementing the hash in T-SQL. A quick Google search turned up the following implementation of MD5 in T-SQL: http://binaryworld.net/Main/CodeDetail.aspx?CodeId=3600. On SQL2005, you can use the built-in hash function: http://msdn.microsoft.com/en-us/library/ms174415.aspx.

Note that MD5 is no longer considered "secure", though it should be sufficient for this scenario. It runs quicker and is easier to implement than SHA1.

Brannon 2008-10-13 19:22:10

Beat me by a few seconds ;)

Joel Coehoorn 2008-10-13 19:23:10

:) .. I wasn't aware that SQL 2000 had an XP for MD5... interesting! Do you have a link?

Brannon 2008-10-13 19:29:25

Except that you could have spurious collisions between triples that hash to the same value. If your keys are 1500 bytes long and you hash into a 128 byte value, on average any key is going to have 10+ collisions in the hashed space.

tvanfosson 2008-10-13 19:34:49

@tvanfosson while it's true that you may have collisions, in practice collisions with MD5 are rare. Also, you can reduce the potential for collisions by appending the 3 hash values (into a 48 byte value), rather than combining all 3 columns into a single 16 byte hash.

Brannon 2008-10-13 20:07:03

The rarity of collisions depends on the density of the key space. If sparse, they are rare. The problem is that this solution depends on the luck of not having a collision. If the app can tolerate false positives it would work, but I prefer a solution that doesn't have this problem.

tvanfosson 2008-10-13 20:11:50

Answer 4

A:

Another idea: could you add a column that stores a hash of those values cat'd together? SQL 2000 supports an extended stored procedure for md5 sums that you could call from a trigger to create the hash.

Joel Coehoorn 2008-10-13 19:22:57

Answer 5

A:

Was working on this in parallel, but looks like it's an extension of some other ideas already proposed... You should be able to create an indexed view using CHECKSUM to handle this.

As an example:

CREATE TABLE dbo.Test_Unique_Limit (
    string_1 NVARCHAR(255) NOT NULL,
    string_2 NVARCHAR(255) NOT NULL,
    string_3 NVARCHAR(255) NOT NULL )
GO
CREATE VIEW dbo.Test_Unique_Limit_View
WITH SCHEMABINDING
AS
    SELECT string_1, string_2, string_3, CHECKSUM(string_1, string_2, string_3) AS CHKSUM
    FROM dbo.Test_Unique_Limit
GO
CREATE UNIQUE CLUSTERED INDEX Test_Unique_Limit_IDX ON dbo.Test_Unique_Limit_View (CHKSUM)
GO

I haven't used this method in production, so I can't vouch for either its performance or complete accuracy. You'll need to do your own testing there.

One thing that I will point out though... the fact that you're running into this issue should make you take a step back and consider whether or not the model itself is really as good as it should be. A unique index on the columns as you've described seems a bit odd.

Good luck!

Tom H. 2008-10-13 19:27:48

Answer 6

A:

I don't have access to SQL 2000 Server so I couldn't test this there, but assuming that it supports scalar functions it ought to work. You could make it faster by adding non-unique indices on each column so that it doesn't have to do a table scan.

CREATE FUNCTION [dbo].[fn_count_rows_for_ids]
(
    @id1 nvarchar(255),
    @id2 nvarchar(255),
    @id3 nvarchar(255)
)
RETURNS int
AS
BEGIN
    DECLARE @count int

    SELECT @count = count(*) FROM tbl_test WHERE (string_1 = @id1 AND string_2 = @id2 AND string_3 = @id3)

    RETURN @count

END

CREATE TABLE [dbo].[tbl_test](
    [string_1] [nvarchar](255) NOT NULL,
    [string_2] [nvarchar](255) NOT NULL,
    [string_3] [nvarchar](255) NOT NULL
) ON [PRIMARY]

ALTER TABLE [dbo].[tbl_test]  WITH CHECK ADD  CONSTRAINT [CK_tbl_test] CHECK  (([dbo].[fn_count_rows_for_ids]([string_1],[string_2],[string_3])<=(1)))

ALTER TABLE [dbo].[tbl_test] CHECK CONSTRAINT [CK_tbl_test]

tvanfosson 2008-10-13 20:06:27

ansaurus

tags:

views:

answers:

SQL Server unique set of columns without using a constraint - getting bit by the 900 byte limit

related questions