views:

281

answers:

6

Hi folks,

I've got two tables and I've added a foreign key constraint. Kewl - works great. Now, is it possible to further constrain that relationship against some data in the parent table?

Basically, I have animals in the parent table, and for the child table wishto only contain data where the parent data are .... um .. mammals.

eg.

Animals
^^^^^^^
AnimalId INT PK NOT NULL IDENTITY
AnimalType TINYINT NOT NULL -- 1: Mammal, 2:Reptile, etc..
Name

Mammals
^^^^^^^
AnimalId INT PK FK NOT NULL
NumberOfMammaryGlads TINYINT NOT NULL

So, i wishto make sure that the AnimalId can only be of type Animals.AnimalType = 1

Is this possible??

I don't want to allow someone to try and insert some info against a reptile, in the child table...

Cheers :)

Edit:

I thought I had to use a Check Constraint (confirmed below from my first two answers - cheers!), but I wasn't sure how to (eg. the sql syntax to refer to the Animals table).

Update:

Alex has a very good post (below) that benchmarks some of the suggestions.... a very good read!

A: 

I think you want to use a Check constraint within the Mammals table.

http://msdn.microsoft.com/en-us/library/ms188258.aspx

Tetraneutron
@Tet: I don't wish to check against the AnimalId value, but against the Animals.AnimalTypeId value.
Pure.Krome
Ah - true - I reverted it to my original answer - to check against the AnimalType - you will need to call a function - which as Alex says in another comment may have an affect on performance.
Tetraneutron
+2  A: 

You can create a CHECK CONSTRAINT on the column.

ALTER TABLE Mammals
ADD CONSTRAINT CHK_AnimalType CHECK (dbo.fnGetAnimalType(animalId) = 1 );

Now you need a function fnGetAnimalType that will return the animalType of the given animalId.

Here is more info from MSDN.

Jeff Meatball Yang
Ah... i need to do this via a FUNCTION! ahhhhhhhhhhhhhhhhhhhhhh... brb :)
Pure.Krome
It will surely run very slowly
AlexKuznetsov
@Alex, a check constraint will affect performance, but your claim that it will run very slowly is very premature.
Jeff Meatball Yang
@Alex: it checks only on insert or update: your denormalised solution would still need a lookup onto Animals. The udf approach is cleaner
gbn
+4  A: 

Have a unique constraint on Animals(AnimalId, AnimalType) Add AnimalType to Mammals, and use a check constraint to make sure it is always 1. Have a FK refer to (AnimalId, AnimalType).

AlexKuznetsov
Duplicating the animalType column on the Mammals table is not advisable, as this increases the table size and goes against data normalization best practices.
Jeff Meatball Yang
The increase in size is minor, and all your logic gets much much simpler. You'll save a lot in CPU cycles and development time.
AlexKuznetsov
Denormalizing to enforce business rules also described here:http://www.devx.com/dbzone/Article/34479/1954and here:http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx
AlexKuznetsov
@Alex: it's still a bad idea, whena trigger or UDF could do it.
gbn
ack :( now i'm caught between solutions! :( which suggestion to follow....
Pure.Krome
Just want to point out that the links point to articles you have written yourself - and at the end of the article (yes, I read it) you say yourself that the cost of using referential integrity is steep. I would suggest that a data access layer outside of SQL should be used to enforce business rules. A database is designed for serving data - not implementing business logic.
Jeff Meatball Yang
@Jeff - i agree with that .. but if it's 'cheap' to have some simple data integrity in the DB, I would like to enforce that, there. If it becomes costly (eg. my Function in the Check Constraint is 'expensive') then I'm happy to drop it. For me, this is very basic and simple data integrety. Business rules should always be outside of the DB. i see this as simple data rules.
Pure.Krome
Jeff, databases are all about managing data, including its integrity. Enforcing data integrity rules outside of the database makes no sense whatsoever! Creating a UNIQUE constraint like this to propagate the subtype down into the child table is a common practice and is relatively harmless in terms of normalisation and performance.
Tony Andrews
Yes, this is common, almost like a SQL 'design pattern', It's usually referred to as 'subclassing'.
onedaywhen
@Jeff, I'd hate to see the data integrity in any database you work with. Data integrity rules should always be enforced at the database level. The application is NOT the only thing that can affect data. This is a rule that must always apply no matter where the data came from and it belongs in the database.
HLGEM
@gbn, check constraints are often better for performance than trigger and far better than a UDF. Check constraints are the first place to enforce a business rule, a trigger should be used only if the logic is so complex that the constraint cannot do the job.
HLGEM
@gbn,The following link describes that the UDF approach has loopholes:http://www.devx.com/dbzone/Article/31985/0/page/3
AlexKuznetsov
@Alex: ...and denormalising data has no issues? YOu may never hit the udf issue but at some point your denormalised data will be out of synch. @HLGEM: A trigger is one solution that avoids denormalising *and* the UDF hole (note UDF is in a check constraint anyway)
gbn
@gbn: because animalType in the child table is included in the FK constraint, the constraint guarantees that animalType value always matches the value in the parent table. As long as the constraint is trusted of course. MVP Hugo Kornelis wrote about trusted constraints:http://sqlblog.com/blogs/hugo_kornelis/archive/2007/03/29/can-you-trust-your-constraints.aspx
AlexKuznetsov
+3  A: 

I ran a small benchmark - in this case the approach with a UDF runs almost 100 times slower.

The overhead of an FK in CPU time = 375 ms - 297 ms = 78 ms

The overhead of an UDF in CPU time = 7750 ms - 297 ms = 7453 ms

Here's the Sql code...

-- set up an auxiliary table Numbers with 128K rows:

CREATE TABLE dbo.Numbers(n INT NOT NULL PRIMARY KEY)
GO
DECLARE @i INT;
SET @i = 1;
INSERT INTO dbo.Numbers(n) SELECT 1;
WHILE @i<128000 BEGIN
  INSERT INTO dbo.Numbers(n)
    SELECT n + @i FROM dbo.Numbers;
  SET @i = @i * 2;
END;
GO

-- the tables

CREATE TABLE dbo.Animals
(AnimalId INT NOT NULL IDENTITY PRIMARY KEY,
AnimalType TINYINT NOT NULL, -- 1: Mammal, 2:Reptile, etc..
Name VARCHAR(30))
GO
ALTER TABLE dbo.Animals
ADD CONSTRAINT UNQ_Animals UNIQUE(AnimalId, AnimalType)
GO
CREATE FUNCTION dbo.GetAnimalType(@AnimalId INT)
RETURNS TINYINT
AS
BEGIN
DECLARE @ret TINYINT;
SELECT @ret = AnimalType FROM dbo.Animals
  WHERE AnimalId = @AnimalId;
RETURN @ret;
END
GO
CREATE TABLE dbo.Mammals
(AnimalId INT NOT NULL PRIMARY KEY,
SomeOtherStuff VARCHAR(10),
CONSTRAINT Chk_AnimalType_Mammal CHECK(dbo.GetAnimalType(AnimalId)=1)
);
GO

--- populating with UDF:

INSERT INTO dbo.Animals
  (AnimalType, Name)
SELECT 1, 'some name' FROM dbo.Numbers;
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
GO
INSERT INTO dbo.Mammals
(AnimalId,SomeOtherStuff)
SELECT n, 'some info' FROM dbo.Numbers;

results are:

SQL Server parse and compile time: 
CPU time = 0 ms, elapsed time = 2 ms.
Table 'Mammals'. Scan count 0, logical reads 272135, 
    physical reads 0, read-ahead reads 0, lob logical reads 0, 
    lob physical reads 0, lob read-ahead reads 0.
Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0, 
    read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.

SQL Server Execution Times:
    CPU time = 7750 ms,  elapsed time = 7830 ms.

(131072 row(s) affected)

--- populating with FK:

CREATE TABLE dbo.Mammals2
(AnimalId INT NOT NULL PRIMARY KEY,
AnimalType TINYINT NOT NULL,
SomeOtherStuff VARCHAR(10),
CONSTRAINT Chk_Mammals2_AnimalType_Mammal CHECK(AnimalType=1),
CONSTRAINT FK_Mammals_Animals FOREIGN KEY(AnimalId, AnimalType)
  REFERENCES dbo.Animals(AnimalId, AnimalType)
);

INSERT INTO dbo.Mammals2
(AnimalId,AnimalType,SomeOtherStuff)
SELECT n, 1, 'some info' FROM dbo.Numbers;

results are:

SQL Server parse and compile time: 
   CPU time = 93 ms, elapsed time = 100 ms.
Table 'Animals'. Scan count 1, logical reads 132, physical reads 0,
    read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
Table 'Mammals2'. Scan count 0, logical reads 275381, physical reads 0,
   read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
   lob read-ahead reads 0.
Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0,
   read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
   lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 375 ms,  elapsed time = 383 ms.

-- populating without any integrity:

CREATE TABLE dbo.Mammals3
(AnimalId INT NOT NULL PRIMARY KEY,
SomeOtherStuff VARCHAR(10)
);
INSERT INTO dbo.Mammals3
(AnimalId,SomeOtherStuff)
SELECT n,  'some info' FROM dbo.Numbers;

results are:
SQL Server parse and compile time: CPU time = 1 ms, elapsed time = 1 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 66 ms.
Table 'Mammals3'. Scan count 0, logical reads 272135, physical reads 0,
    read-ahead reads 0, lob logical reads 0, lob physical reads 0,
    lob read-ahead reads 0.
Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0, 
    read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 297 ms,  elapsed time = 303 ms.

(131072 row(s) affected)

The overhead of an FK in CPU time = 375 ms - 297 ms = 78 ms
The overhead of an UDF in CPU time = 7750 ms - 297 ms = 7453 ms

AlexKuznetsov
Upvote for the effort
Tetraneutron
So it looks like the fastest solution is to have no relationship/constraint. This is not good for a data integrity perspective. Next is to have th FK solution. Even though this seems like we're doubling up the data on the FK field, the cost of this seems to be worth it over the performance benefits, vs having a check contraint that calls a UDF. Do other people agree with me (and AlexK) on this summary?
Pure.Krome
A: 

To give a strong guarantee, you'll need two check constraints going both ways. If you only constrain Mammals someone could update Animals.AnimalType and get the data in an inconsistent state.

Hans Malherbe
A: 

This sounds like table inheritance - I recently asked a question (can't link because I'm too new here!) whose answer lead to finding this solution.

I'm working with monetary Transactions where a Transaction is the parent table and transaction types such as Cheque, Bank Transfer, and CC are the child tables. Using a lookup table for transaction types, foreign key constraints, and computed columns I was able to enforce the type constraint on each of the child tables.

Check out the article for implementation details.

Duke