views:

129

answers:

7

I have the following tables:

Cateogories

  • CategoryID (int) Primary Key
  • CategoryName (varchar)

Items

  • ItemID (int) Primary Key
  • CategoryID (int)
  • ItemName (varchar)

There is a foreign key constraint on Items.CategoryID. There is a chance that when a new item is created that there will be no category assigned.

Is it better to set Items.CategoryID to allow nulls and deal with the nulls in my code OR better to not allow nulls, set the default CategoryID to 1, and create a dummy record in the Categories table called "Uncategorized" and then deal with that dummy category in my code?

A: 

SQL NULLs are tricky, so I think you're better off with a sentinel category.

Tobu
Almost every bullet point in that list is easy to explain mathematically and/or **is** explained in the SQL spec. This is a site for answering questions, not posting gripes about how standards specs don't work exactly the way you think they should.
Aaronaught
I don't need to gripe about the standard — a very useful standard overall — but it's fair to say that some parts of it should be avoided. “Easy to explain mathematically”, this part isn't.
Tobu
The NULLs ANSI spec is inconsistent and commonly considered broken. Most developers that think they understand NULLs really don't.
KenFar
A: 

In this case I believe it's really a matter of personal preference. Either way you'll have to deal with the uncategorized items in your code.

BD
+6  A: 

The logically correct way would be for the CategoryID column to be NULL when there is no Category for the item.

If you get trapped by any of the gotchas that are associated with using NULL, then that is most likely a sign that the design hasnt taken account of the fact that items cannot have a category. Fix the design. The NULL will ensure you stick to solving the correct problem.

Mongus Pong
An item in this context will always have a category even if that category is "Miscellaneous" or similar. NULL should only be used where a field has the logical possibility of a no-data state e.g. date of death for a person that is still alive.
melkisadek
The poster says that there are times when an item has No category. That is different from it having a miscellaneous or other dummy category.Do you have some information that I dont?
Mongus Pong
One more reason to allow NULLs for no category - if you have drop down lists of categories, then users will end up seeing this dummy category there unless you hard-code to exclude it from the drop downs, requiring special coding anyway.
Tom H.
"...there will be no category assigned." is semantically different to "there are times when an item has No category". My initial reading of the question was that the intention was to enforce the category but that users might not enter a category where one wasn't already available (e.g. for a new item type). NULLs should be used only where there is no alternative - in this case there is a very clear and logically consistent alternative which solves the problem as described. A category of miscellaneous is both logically and semantically correct in this case.
melkisadek
What is the deal with all this `NULL` hate? Methinks some people don't code defensively.
Aaronaught
+1  A: 

Ideally you'd want to force a category choice before allowing an item to be created. If an item will have no category at any point in the future then you'll need to create a category specifically to deal with that. I personally wouldn't call it "Uncategorized" though as this implies that a user can just chase it up later - which they will forget to do with alarming regularity!

Go for logical consistency or you'll end up in a mess. If that means creating a "Miscellaneous" category then do that and make sure that (a) Users know when to use it and (b) It is reported on regularly to make sure items are categorised correctly.

melkisadek
+2  A: 

It depends:

If your items really have no category, then I would allow NULLs, as that is what you have: no CategoryId.

If you want to list all categories, you do not want to display the dummy row, so you would have to ignore that.

If you want to display all items and show the categories, you'd better be aware that there are items without category, so you would use a LEFT JOIN in that case.

If possible, change your application to select a category before actually saving your item.


If you want to treat that Uncategorized category just like the other categories (list them with the other categories, count items assigned to it, select it in lists/dropdowns), then it should get it's own category, and Item.CategoryId should be NOT NULL.

Peter Lang
A: 

I do not believe that either of the alternatives are very good.

If you choose the NULL approach you will have problems with the gotchas involved in working with NULLs. If you choose to not allow nulls, you will need to handle cases where if you delete a category the item would cascade.

IMO the best design is to have three tables.

Categories
ID
Name

Items
ID
Name

Categories2Items
CategoryID
ItemID

This eliminates the need for NULL (and the gotchas involved) as well as allows you to have uncategorized items, and items which belong to several categories. This design is also in Boyce-Codd Normal form which is always a good thing .. en.wikipedia.org/wiki/BCNF

eckesicle
Then you get multiple categories per item, call them tags. It's a good design, but is that what the OP wants?
Tobu
I should have mentioned that it was a one to one for items to categories in this case. Your answer is appropriate for a many to many situation. For a one to one, this over complicates the SQL and requires you to enforce the one to one in the application (messy). My original design is valid to at least 3NF.
Developr
Yes, I am aware. But the design is extensible and eliminates a number of problems in both designs. Additionally if I may be a nitpick, categories are not per definition disjoint.However, if this is not what the OP wants then choosing the NULL approach is clearly better than having a sentinel category. Removal of a category of which an item belongs would require either a CASCADE to remove the item, which is probably not a good thing, triggers that not all DBMS support or to leave a NULL value as the categoryid. Then OP would have both NULL-categorized and Uncategorized items in their DB.
eckesicle
+1  A: 

For simple lookup tables of this type it is almost always better to disallow NULLs and have the unknown value in your lookup table.

Why?

  • Because the ANSI NULL specifications are inconsistent and very complex. Dealing with nulls greatly increases the likelihood of coding defects, and takes a lot more code to write
  • Because few developers really understand how NULLs work in all scenarios
  • Because it simplifies your model and queries nicely. You can join things together nicely with inner joins from either direction with very simple sql.

However, a few cautions:

  • You may want more than one "dummy" value: one for "unknown" and another for "not assigned". Of course, NULL bundles both into a single value, so you're going above & beyond the minimal standard if you do this.
  • You will end up sometimes having additional non-key attributes that either must be nullable or carry 'n/a' type values for the dummy rows. For heavily denormalized lookup tables (like warehousing dimensions) you'll probably want nulls allows for these columns because 'n/a' doesn't work well for timestamps, amounts, etc.
  • If you apply this technique to more than just simple lookup tables it will dramatically complicate your design. Don't do that.
KenFar