views:

132

answers:

7

Given the schema

PERSON { name, spouse }

where PERSON.spouse is a foreign key to PERSON.name, NULLs will be necessary when a person is unmarried or we don't have any info.

Going with the argument against nulls, how do you avoid them in this case?

I have an alternate schema

PERSON { name }
SPOUSE { name1, name2 }

where SPOUSE.name* are FKs to PERSON. The problem I see here is that there is no way to ensure someone has only one spouse (even with all possible UNIQUE constraints, it would be possible to have two spouses).

What's the best way to factor out nulls in bill-of-materials style relations?

+1  A: 

Well, first I would use auto-incrementing IDs as, of course, someone could have the same name. But, I assume you intend to do that and won't harp on it. However, how does the argument against NULLs go exactly? I don't have any problem with NULLs and think that is the appropriate solution to this problem.

BobbyShaftoe
Some relational theorists, especially Chris Date, are rabidly anti-NULL because a NULL can mean either unknown or does-not-exist. Also, three valued logic sucks.
Dave
I've read some of those anti-NULL arguments; I guess I am just non-plussed by them and that leads us to this sort of problem. I think solving problems where there are NULLs are, for the most part, not very difficult.
BobbyShaftoe
Having run into problems with NULLs, I'm willing to go with the theorists, but I'm not sure how my solution above is any better than just leaving it the way it was, as you've recommended.
Dave
A: 

Well, begin with using a key other than name, perhaps a int seed. But to prevent someone from having more than one spouse, simply add a unique index to the parent(name1) in the spouse table. that will prevent you from ever inserting the same name1 twice.

Brian Rudolph
But the same name can show up in the name1 column and in the name2 column.
Steven Huwig
A: 

You can use a trigger to enforce the constraint. PostgreSQL has constraint triggers, a particularly nice way to defer the constraint evaluation until the appropriate time in the transaction.

From Fabian Pascal's Practical Issues in Database Management, pp. 66-67:

Stored procedures—whether triggered or not—are preferable to application level integrity code, but they are practically inferior to and riskier than declarative support because they are more burdensome to write, error prone, and cannot benefit from full DBMS optimization.

...

Choose DBMSs with better declarative integrity support. Given the considerable gaps in such support by products, knowledgeable users would be at least in a position to emulate correctly—albeit with procedural and/or application code—constraints not supported by the DBMS.

Steven Huwig
That was my first thought, but I'd think performance could become an issue for very large instances of BOM relations.
Dave
The performance concern is the same no matter how the constraint is spelled. If your DBMS supported subqueries within declarative check constraints, the same cost would be incurred.
Steven Huwig
It's a bit worrisome if you have to introduce a lot of triggers to avoid using NULLs.
BobbyShaftoe
You don't need to introduce triggers to avoid NULLS. You need to introduce triggers when the declarative constraints in SQL are not expressive enough.
Steven Huwig
I have not read any of Pascal's work. What is your opinion of him vs Date vs Celko?
Dave
Date and Pascal have collaborated -- Pascal was a longtime public proponent of Date's criticisms of SQL and current data management practices.In his books, Celko will give you a hack solution without addressing the shortcomings of the hack. Date and Pascal have both criticized him for this.
Steven Huwig
My opinion is that Date and Pascal are right and Celko is wrong, but sometimes you have to break the model to get things to work in today's DBMSes. Your question is not one of those times, however.
Steven Huwig
+2  A: 

I think that enforcing no NULLs and no duplicates for this type of relationship makes the schema definition way more complicated than it really needs to be. Even if you allow nulls, it would still be possible for a person to have more than one spouse, or to have conflicting records e.g:

PERSON { A, B }
PERSON { B, C }
PERSON { C, NULL }

You'd need to introduce more data, like gender (or "spouse-numbers" for same-sex marriages?) to ensure that, for example, only Persons of one type are allowed to have a Spouse. The other Person's spouse would be determined by the first person's record. E.g.:

PERSON { A, FEMALE, B }
PERSON { B, MALE, NULL }
PERSON { C, FEMALE, NULL }

... So that only PERSONs who are FEMALE can have a non-null SPOUSE.

But IMHO, that's overcomplicated and non-intuitive even with NULLs. Without NULLs, it's even worse. I would avoid making schema restrictions like this unless you literally have no choice.

Adam Bellaire
I didn't really consider all the possible problems that using nulls could introduce. I wish I could vote you up more than once.
Dave
These are good points.
BobbyShaftoe
A: 

All right, use Auto-IDs and then use a Check Constraint. The "Name1" column (which would only be an int ID) will be force to only have ODD numbered IDs and Name2 will only have EVEN.

Then create a Unique Constraint for Column1 and Column2.

BobbyShaftoe
That's pretty slick. I need to take a few minutes to convince myself it works.
Dave
What would an insert statement look like for that design?
Steven Huwig
The insert statement would be a plain old insert statement. However, you would have to do some logic before issued the statement to determine which field it should be inserted into.
BobbyShaftoe
I don't see this as a good answer from the relational modeling point of view. You are adding a synthetic attribute to work around the lack of declarative constraints in SQL.
Steven Huwig
I agree it's a little 'hackish', but look at the problems with using NULLs pointed out by Adam Bellaire.
Dave
There is a simple way to avoid both -- implement the constraint in a procedural trigger. This is the method recommended by Fabian Pascal in "Practical Issues in Database Management," and there is not a greater enemy of nulls and procedural code.
Steven Huwig
No greater enemy? Well, perhaps that is true if you have never debugged problems related to triggers, particularly ones you did not create yourself.
BobbyShaftoe
This kind of a hack will almost always lead to bad things in the future. The relational model can easily handle this kind of a relationship with no need for this kind of workaround. In writing an algorithm a cute trick can often save resources. In DB design it will almost always cause headaches.
Tom H.
I don't disagree that hacks are bad, but I still don't see an answer to the question. Sometimes ugly is all you've got.
Dave
I meant that Fabian Pascal is a noted critic of triggers and nulls, and even he recommends a procedural trigger over making your data model confusing.
Steven Huwig
So why all these downvotes? I noted in another answer that I would not go the route of disallowing nulls so I'm just trying to answer within the parameters of the question. I don't say triggers are much worse than this but they are hardly more elegant as well. Tom, your answer didn't solve the issue
BobbyShaftoe
Triggers keep the logical data model the same -- your solutions requires a superfluous attribute.
Steven Huwig
A: 

You need a person TABLE and a separate "Partner_Off" table to define the relationship.

Person (id, name, etc );

Partner_Off (id, partner_id, relationship);

To deal with the more complex social situation you probaly would probably need some dates in there, plus, to simplify the sqls you need one entry for (fred,wilma,husband) and a matching entry for (wilma,fred,wife).

James Anderson
That seems to be the same as the alternate schema in the question. How is it different?
Dave
+1  A: 

I'm not sure why no one has pointed this out yet, but it's actually quite easy to ensure that a person has only one spouse, using pretty much the same model that you have in your question.

I'm going to ignore for the moment the use of a name as a primary key (it can change and duplicates are fairly common, so it's a poor choice) and I'm also going to leave out the possible need for historical tracking (you might want to add an effective date of some sort so that you know WHEN they were a spouse - Joe Celko has written some good stuff on temporal modeling, but I don't recall which book it was in at the moment). Otherwise if I got divorced and remarried you would lose that I had another spouse at another time - maybe that isn't important to you though.

Also, you might want to break up name into first_name, middle_name, last_name, prefix, suffix, etc.

Given those caveats...

CREATE TABLE People
(
     person_name     VARCHAR(100),
     CONSTRAINT PK_People PRIMARY KEY (person_name)
)
GO
CREATE TABLE Spouses
(
     person_name     VARCHAR(100),
     spouse_name     VARCHAR(100),
     CONSTRAINT PK_Spouses PRIMARY KEY (person_name),
     CONSTRAINT FK_Spouses_People FOREIGN KEY (person_name) REFERENCES People (person_name)
)
GO

If you wanted to have spouses appear in the People table as well then you could add an FK for that as well. However, at that point you're dealing with a bidirectional link, which becomes a bit more complex.

Tom H.
It was implied that spouses would be in the PERSON relation. If we use the bidirectional link you mention, how is it different from the alternate schema in the question?
Dave
I'm not sure how that solves the problem ...
BobbyShaftoe