views:

958

answers:

5

I'm trying to create a relation where any of four different parts may be included, but any collection of the same parts should be handled as unique.

Example: An assignment must have an assigned company, may optionally have an assigned location, workgroup and program. An assignment may not have a workgroup without a location.

Let's assume we have companies A, B, C; locations X, Y, Z; workgroups I, J, K and programs 1, 2, 3.

So valid relations could include A - X - I - 1 A - Z - 2 B - Y C C - 3 B - Z - K

But invalid relations would include A - K (Workgroup without location) Y - K - 1 (No company)

So, to create my table, I've created

companyID INT NOT NULL,
FOREIGN KEY companyKEY (companyID) REFERENCES company (companyID),
locationID INT,
FOREIGN KEY locationKEY (locationID) REFERENCES location (locationID),
workgroupID INT,
FOREIGN KEY workgroupKEY (workgroupID) REFERENCES workgroup (workgroupID),
programID INT,
FOREIGN KEY programKEY (programID) REFERENCES program (programID),
UNIQUE KEY companyLocationWorkgroupProgramKEY (companyID, locationID, workgroupID, programID)

I figure this would handle all my relations besides the neccessity of an assignment to have a location if there is a workgroup (which I can happily do programatically or with triggers, I think)

However, when I test this schema, it allows me to enter the following...

INSERT INTO test VALUES (1, null, null, null), (1, null, null, null);

...without complaint. I'm guessing that (1, null, null, null) does not equal itself because nulls are included. If this is the case, is there any way I can handle this relation?

Any help would be appreciated!

+2  A: 

The only way I can think of handling this without additional triggers/programming would be to have a single "None of the Above" value in each of the referenced tables, so that your test would look like

INSERT INTO test VALUES (1, NO_LOCATION, NO_WORKGROUP, NO_PROGRAM),
                        (1, NO_LOCATION, NO_WORKGROUP, NO_PROGRAM)

Where the NO_* identifiers are the right type/length for your ID columns. This would then fail, as you'd expect it.

Harper Shelby
+2  A: 

This is a Feature (though not what I expected, either).

This thread suggests making your key a Primary key to get the behavior you expected:

This is a feature - a NULL value is an undefined value, therefore two NULL values are not the same. Can be a little confusing but makes sense when you think about it.

A UNIQUE index does ensure that non-NULL values are unique; you could specify that your column not accept NULL values.

Michael Haren
Setting the columns to be "NOT NULL" would work, but would also require the addition of data items like Rob and I suggested to allow the database to correctly match the requirements listed - if NULL isn't allowed, you need a 'NO FOO' value that's a valid entry in the FOO table the FK refers to.
Harper Shelby
A: 

In MySQL NULL != NULL, or anything. So that is what the UNIQUE doesn't work. You should use another default value for blanks, like zero

Rob
A: 

Thanks for all the super-fast responses!

Guess I'll have to either do this programatically or go the trigger route.

Rob and I *both* mentioned a way to do this without programming or triggers. Defining a "No Location", "No Workgroup", and "No Program" value in the respective tables would permit you to use your table as defined.
Harper Shelby
No problem. Harper's solution is fine, too.
Michael Haren
Or Rob--they're the same thing.
Michael Haren
I'm a new guy here, so I'm not entirely sure of all the etiquette here. And, you all answered the question but it won't let me flag all of your answers as accepted.
Drew: I'm not concerned about the 'accepted' at all - I just want to make sure you realize you *can* use your existing design by adding a bit of 'guard' data into your database. From a referential integrity point of view, it's the Right Thing (TM) to do.
Harper Shelby
But yeah, entering the "null" values would be other headaches as I'd have to programatically check for their existance each time I pulled them up. Programmatically/trigger double checking will result in less complexity for my system, I think...
I don't see why you would - your system rules explicitly list "No Location" as a valid option, so having those values show up would make sense from that standpoint. Why would you need to programatically check for the existence of these values?
Harper Shelby
Because while "No Location" is a valid option, it won't have valid data. You can't send product to "No Location" and "No Workgroup" doesn't have a manager you can call up, for example.
How does handling "No Location" in your program differ from handling "NULL"? I can't see how this change would make the programming any more difficult, and it saves you a lot of verification code in your DAL.
Harper Shelby
Fair point. I'll have to roll around the options in my head. Just sort of annoying that I'll have to add a useless row to do what should be doable without it. Thanks again for all the help!
A: 

I think it's important to note that there is a proper way for NULL values to be interpreted and handled, and the behavior exhibited by the OP is exactly what's intended. You can disregard that behavior, and you can handle your query any way you want without objection from me, but it might be well to "Accept" an answer that describes some form of Best Practices, rather than a non-standard personal preference.

Or if you don't agree with the consensus Best Practice, you can just not Accept any answer.

It's not a race to get an answer accepted as quickly as possible. Deliberation and collaboration are also intended to be part of the process, I think.

le dorfier