ansaurus

Question

How to enforce DB integrity with non-unique foreign keys?

Answer 1

A:

I would use and association table to link the many versions to the one pk.

Preet Sangha 2009-10-09 12:51:37

Could you explain this in more detail? Is your solution similar to Mark's?

DzinX 2009-10-10 08:51:26

yes. A another table link resolves the many-many link

Preet Sangha 2009-10-10 09:11:17

Answer 2

A:

Okay, first thing is that you need to normalize your tables. Google "database normalization" and you'll come up with plenty of reading. The PERSONS table, in particular, needs attention.

Second thing is that when you're creating foreign key references, 99.999% of the time you want to reference an ID (numeric) value. I.e., [DOGS].[owner] should be a reference to [PERSONS].[id].

Edit: Adding an example schema (forgive the loose syntax). I'm assuming each dog has only a single owner. This is one way to implement Person history. All columns are not-null.

Persons Table:
int Id
varchar(30) name
...

PersonHistory Table:
int Id
int PersonId (foreign key to Persons.Id)
int Version (auto-increment)
varchar(30) name
...

Dogs Table:
int Id
int OwnerId (foreign key to Persons.Id)
varchar(30) name
...

The latest version of the data would be stored in the Persons table directly, with older data stored in the PersonHistory table.

Jon Seigel 2009-10-09 12:56:45

First, as for normalization, OK, I can throw away "PERSONS.id", but that won't solve my problem at all. Second, if DOGS.owner would reference PERSONS.id, it would point to a person with fixed version, not the latest, which is exactly what I want to avoid.

DzinX 2009-10-09 13:10:46

@DzinX: Please do the Google search I suggested, as by your comment it's clear you don't understand what normalization means.

Jon Seigel 2009-10-09 13:13:42

Ouch, that hurt! FYI, I understand very well what normalization means. If you have an idea how to solve my problem, please explain it in more detail and maybe provide at least a part of a solution.

DzinX 2009-10-09 13:23:08

@DzinX: "First, as for normalization, OK, I can throw away "PERSONS.id", ..." -- this is not normalization.

Jon Seigel 2009-10-09 13:48:12

It is, as the (name, version) pair implies id and vice versa, so I can discard either id or version (because name is referenced by DOGS table). I want to keep the version for reasons explained in my post, so the only thing I can discard is id. But this doesn't help me at all! I can't see the solution you have in mind. What normalization would you apply to my tables and how would it help?

DzinX 2009-10-09 13:56:26

@DzinX: Unless you've omitted something in your original post, there is no reason to believe (name, version) pairs are unique.

Jon Seigel 2009-10-09 14:17:03

Quote: "I thought that a good idea would be to have two columns that identify the row: (name, version)." That aside, from what I see, your solution is similar to Mark's and suffers from the same problem: there's no guarantee that there exists at least one PersonVersion for each Dog (linked through Persons table). Can this be fixed somehow?

DzinX 2009-10-10 08:55:40

@DzinX: Edited my answer to move to a different structure of how history is stored. The two solutions posted aren't good because you'll have dangling data generated by avoiding the circular foreign key reference. This would be the only way to enforce your constraint at a database level. If you can constrain the data on an application level, my previous edit will work fine. (Note: the constraint reduces to ensuring a history exists for each person, since a dog is guaranteed to have an owner by the not-null foreign key in that table.)

Jon Seigel 2009-10-12 17:24:08

Answer 3

+1 A:

Alternatively you could maintain a person history table for the data that has historic value. This way you keep your Persons and Dogs table tidy and the references simple but also have access to the historically interesting information.

Michael Gattuso 2009-10-09 12:56:55

Yeah, I could do that, but then I should keep the current data in two copies: one in CurrentPerson, and one (with highest version) in PersonHistory, so that I don't have to switch between tables when pointing to a person from a document to be printed. This means that I have to make sure that these two entries contain the same data... that's it! Your solution gave me a good idea! Thanks! I'll post it as a separate answer.

DzinX 2009-10-10 09:16:02

Answer 4

A:

A project I have worked on addressed a similar problem. It was a biological records database where species names can change over time as new research improved understanding of taxonomy.

However old records needed to remain related to the original species names. It got complicated but the basic solution was to have a NAME table that just contained all unique species names, a species table that represented actual species and a NAME_VERSION table that linked the two together. At any one time there would be a preferred name (ie the currently accepted scientific name for the species) which was a boolean field held in name_version.

In your example this would translate to a Details table (detailsid, otherdetails columns) a link table called DetailsVersion (detailsid, personid) and a Person Table (personid, non-changing data). Relate dogs to Person.

andyb 2009-10-09 13:00:55

Hmm from what I see, I now have four tables and still no guarantee that there exists a Details row that corresponds to a Dog row. Am I missing something?

DzinX 2009-10-10 09:04:59

Answer 5

A:

Persons

id (int),
name,
.....
activeVersion (this will be UID from personVersionInfo)

note: Above table will have 1 row for each person. will have original info with which person was created.

PersonVersionInfo

UID (unique identifier to identify person + version),
id (int),
name,
.....
versionId (this will be generated for each person)

Dogs

DogID,
DogName
......

PersonsWithDogs

UID,
DogID

EDIT: You will have to join PersonWithDogs, PersionVersionInfo, Dogs to get the full picture (as of today). This kind of structure will help you link a Dog to the Owner (with a specific version).

In case the Person's info changes and you wish to have latest info associated with the Dog, you will have to Update PersonWithDogs table to have the required UID (of the person) for the given Dog.

You can have restrictions such as DogID should be unique in PersonWithDogs.
And in this structure, a UID (person) can have many Dogs.

Your scenarios (what can change/restrictions etc) will help in designing the schema better.

shahkalpesh 2009-10-09 13:06:49

Also a dog can have multiple owners which also happens.

HLGEM 2009-10-09 19:23:24

@HLGEM: Yes and that is the reason, I said "Your scenarios (what can change/restrictions etc) will help in designing the schema better" :)

shahkalpesh 2009-10-10 04:57:29

I don't like the way you have to update PersonsWithDogs each time you change Person's info. This way, I could have my 2 starting tables and have DOGS.owner_id instead of DOGS.owner_name and update this id on each person's change. This can be costly if a person has many dogs.

DzinX 2009-10-10 09:08:40

Answer 6

A:

Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the first of them. Please vote which one you like better.

Solution 1

In PERSONS table, we leave only the name (unique identifier) and a link to current person's data:

TABLE PERSONS:
    name:            varchar(30),
    current_data_id: int

We create a new table, PERSONS_DATA, that contains all data history for that person:

TABLE PERSONS_DATA:
    id:      int
    version: int (auto-generated)
    ... // some data, like address, etc.

DOGS table stays the same, it still points to a person's name (FK to PERSONS table).

ADVANTAGE: for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)

DISADVANTAGE: if you want to change a person's data, you have to:

add a new PERSONS_DATA row
update PERSONS entry for this person to point to the new PERSONS_DATA row.

DzinX 2009-10-10 09:31:56

Answer 7

A:

Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the second of them. Please vote which one you like better.

Solution 2

In PERSONS table, we leave only the name (unique identifier) and a link to the first (not current!) person's data:

TABLE PERSONS:
    name:            varchar(30),
    first_data_id: int

We create a new table, PERSONS_DATA, that contains all data history for that person:

TABLE PERSONS_DATA:
    id:      int
    name:    varchar(30)
    version: int (auto-generated)
    ... // some data, like address, etc.

DOGS table stays the same, it still points to a person's name (FK to PERSONS table).

ADVANTAGES:

for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
if I want to change a person's data, I don't have to update the PERSONS row, only add a new PERSONS_DATA row

DISADVANTAGE: to retrieve current person's data, I have to either:

choose PERSONS_DATA with given name and highest version (may be expensive)
choose PERSONS_DATA with special version, e.g. "-1", but then I would have to update two PERSONS_DATA rows each time I add new PERSONS_DATA, and in this solution I wanted to avoid having to update 2 rows...

What do you think?

DzinX 2009-10-10 09:33:17

Answer 8

+2 A:

In your parent table, create a unique constraint on (id, version). Add version column to your child table, and use a check constraint to make sure that it is always 0. Use a FK constraint to map (parentid, version) to your parent table.

AlexKuznetsov 2009-10-12 18:48:14

Wonderful! A little too much data in child table, but definitely worth it -- brings simplicity and order to the design. Thanks!

DzinX 2009-10-13 10:05:50

ansaurus

tags:

views:

answers:

How to enforce DB integrity with non-unique foreign keys?

Persons

PersonVersionInfo

Dogs

PersonsWithDogs

related questions