views:

576

answers:

5

In MSSQL 2005 I just struck the infamous error message:

Introducing FOREIGN KEY constraint XXX on table YYY may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

Now, StackOverflow has several topics about this error message, so I've already got the solution (in my case I'll have to use triggers), but I'm curious as to why there is such a problem at all.

As I understand it, there are basically two scenarios that they want to avoid - a cycle and multiple paths. A cycle would be where two tables have cascading foreign keys to each other. OK, a cycle can span several tables too, but this is the basic case and will be easier to analyze.

Multiple paths would be when TableA has foreign keys to TableB and TableC, and TableB also has a foreign key to TableC. Again - this is the minimum basic case.

I cannot see any problems that would arise when a record would get deleted or updated in any of those tables. Sure, you might need to query the same table multiple times to see which records need updating/deleting, but is that really a problem? Is this a performance issue?

In other SO topics people go as far as to label using cascades as "risky" and state that "resolving cascade paths is a complex problem". Why? Where is the risk? Where is the problem?

A: 

I agree with that cascades being "risky" and should be avoided. (I personally prefer cascading the changes manually rather that having sql server automatically take care of them). This is also because even if sql server deleted millions of rows, the output would still show up as

(1 row(s) affected)

Yasir Laghari
Would replacing it with a trigger make it better? Of course - there is such an approach as "not using foreign keys at all" and doing everything from code. I don't think I need to tell you why I think that is a bad idea...
Vilx-
A: 

The reason we forbid using cascade delete has to do with performance and locking. Yes it's not so bad when you delete one record but sooner or later you will need to delete a large group of records and your database will comes to a standstill.

If you are deleting enough records, SQL Server might escalate to a table lock and no one can do anything with the table until it is finished.

We recently moved one of our clients to his own server. As part of the deal we also then had to delete all of that client's records form our original server. Deleting all his information in batches (so as not to cause problems with other users) took a couple of months. If we had cascade delete set up, the database would have been inaccessible to the other clients for a long time as millions of records were deleted in one transaction and hundreds of tables were locked until the transaction was done.

I could also see a scenario where a deadlock might have occured in using cascade delete because we have no control over the order the cascade path would have taken and our database is somewhat denormalized with clientid appearing in most tables. So if it locked the one table that had a foreign key also to a third table as well as the client table that was in a differnt path, it possibly couldn't check that table in order to delete from the third table because this is all one transaction and the locks wouldn't be released until it was done. So possibly it wouldn't have let us set up cascade deletes if it saw the possibility of creating deadlocks in the transaction.

Another reason to avoid cascading deletes is that sometimes the existence of a child record is reason enough not to delete the parent record. For instance, if you have a customer table and that customer has had orders in the past, you would not want to delete him and lose the information on the actual order.

HLGEM
If you're dealing with millions of rows then almost always you have to use tricky workarounds like batching, disabling foreign keys, running the job in idle times, etc. But that's not an everyday scenario. People prepare for that sort of thing in advance. They test it in a testing environment. They make long and twisted scripts which tweak the DB so that it would get an acceptable performance. And once it's done it's forgotten. But "ON DELETE CASCADE" is meant to be used in everyday life, when you're not deleting millions of rows but at most a few hundred at a time.
Vilx-
And deadlocks on cascades (or even simple deletes for several rows in the same table) can happen anyway - preventing multiple cascade paths and cycles doesn't do one bit to make it better.
Vilx-
A: 

Consider a table of employees:

CREATE TABLE Employee
(
    EmpID   INTEGER NOT NULL PRIMARY KEY,
    Name    VARCHAR(40) NOT NULL,
    MgrID   INTEGER NOT NULL REFERENCES Employee(EmpID) ON DELETE CASCADE
);

INSERT INTO Employees(     1, "Bill",   1);
INSERT INTO Employees(    23, "Steve",  1);
INSERT INTO Employees(234212, "Helen", 23);

Now suppose Bill retires:

DELETE FROM Employees WHERE Name = "Bill";

Ooooppps; everyone just got sacked!

[We can debate whether the details of the syntax are correct; the concept stands, I think.]

Jonathan Leffler
I'd argue that in this case the ON DELETE CASCADE was used incorrectly. Sorry, it's your own fault if you build a buggy program - it's not the job of the programming language to protect you from writing bad code.
Vilx-
+1  A: 

You have a child table with 2 cascade paths from the same parent: one "delete", one "null".

What takes precedence? What do you expect afterwards? etc

Note: A trigger is code and can add some intelligence or conditions to a cascade.

gbn
A: 

I think whether or not to use a ON DELETE CASCADE option is a question of the business model you are implementing. A relationship between two business objects could be either a simple "association", where both ends of the relationship are related, but otherwise independent objects the lifecycle of which are different and controlled by other logic. There are, however, also "aggregation" relationships, where one object might actually be seen as the "parent" or "owner" of a "child" or "detail" object. There is the even stronger notion of a "composition" relationship, where an object solely exists as a composition of a number of parts. In the "association" case, you usually won't declare an ON DELETE CASCADE constraint. For aggregations or compositions, however, the ON DELETE CASCADE helps you mapping your business model to the database more accurately and in a declarative way. This is why it annoys me that MS SQL Server restricts the use of this option to a single cascade path. If I'm not mistaken, many other widely used SQL database systems do not impose such restrictions.

Mirko Klemm
And this is an answer...how?
Vilx-