What algorithms are known to perform the task of updating a database by inserting, updating, and deleting rows in the presence of database constraints?
More specifically, say that before images of rows to be deleted, after images of rows to be inserted, and both images of rows to be updated are in memory. The rows might be for several tables. An exact sequence of updates is either not known or has not been preserved -- only the before images and the after images that the database must ultimately be made to reflect are known.
The database contains primary key, foreign key, and unique index constraints. The problem is to find the sequence of commands to bring the database up to date. For simplicity, I am willing to specify that the primary key of a row will never be modified.
The database system does not support deferred constraint checking. (The solution is trivial for such databases). I must also make the rule that primary key columns may not be updated after insert and it is not allowed to delete a row and re-insert one with the same primary key, even though some algorithms might otherwise find it convenient to do so. (This is necessary for common scenarios where all primary keys are autogenerated by the database system.)
What are the algorithms:
- Assuming that foreign key constraints must be enforced at all times but unique indexes are not used.
- Assuming that foreign key constraints and unique index constraints must be enforced at all times.
I ask for both because I think #1 might be considerably simpler.
EDIT: The goal here is to solve the problem of lack of deferred constraint checking in a general (or nearly general purpose) way. I suppose high-quality ORM packages must do this. I want an explanation of the algorithms, either provided by you here or externally in an academic paper, etc. I will not consider a pointer to a software package or source code an answer to this question.
NAIVE ALGORITHM:
Loop through the tables and for each row that is added, changed, or deleted generate a single INSERT, UPDATE, or DELETE statement, respectively for each. Loop through the generated statements and apply to the database. If a statement fails to apply, continue with the other statements. Retry the statements that have failed. Keep iterating until there are no more failures or a pass successfully executes no statements. If statements remain, attempt temporary tweaking of the data in the problematic columns to try to get them to succeed.
This is an ugly brute force algorithm and figuring out the "temporary tweaking" part is a challenge of its own. So, I would like an improved and complete algorithm.
EDIT2:
RBarryYoung posts an answer which gets close (but no cigar) to fully solving scenario #1 while solving the most common scenario #2 problems simultaneously. The following is an example of a scenario #1 update pattern I have seen very often in applications but have not yet arrived at the solution. DELETE/UPDATE-INSERT is exactly right a lot of the time in scenario #1, but the trick is to figure out when to deviate from it. I also suspect that deviating from it magnifies the occurence of UNIQUE problems per scenario #2, possibly increasing my interest in solving scenario #2 as well.
Notice that there are no cycles nor are any primary keys modified. However, a foreign key to a parent is modified.
CREATE TABLE A
(
AId INT NOT NULL PRIMARY KEY
)
CREATE TABLE B
(
BId INT NOT NULL PRIMARY KEY,
AId INT NOT NULL FOREIGN KEY REFERENCES A (AId)
)
CREATE TABLE C
(
CId INT NOT NULL PRIMARY KEY,
AId INT NOT NULL FOREIGN KEY REFERENCES A (AId),
BId INT NOT NULL FOREIGN KEY REFERENCES B (BId)
)
Before Images:
A (1)
B (1,1)
C (1,1,1)
After Images:
A (1)
B (2,1) [To be deleted: (1,1)]
C (1,1,2)
Sort Order: A,B,C
The first command is DELETE B (1,1) which fails due to C (1,1,1).
Note that if the third column in C allows NULL (which it does not in this case), a pure solution might involve NULLing it out in an early pass as this would allow the given algorithm to proceed normally and have its usual advantages for dealing with most of the scenario #2 problems. An excellent solution to this problem needs to take things like this into account as well. The full generality of this question is undoubtedly a fascinating problem.