views:

287

answers:

5

Hi,
I have a tricky problem that I've been messing about with for a few days now and cant find an optimal solution for.

These are my tables:

  • site
  • site_node
  • page

The site node table contains a list of nodes representing a hierarchy (using nested set). Each node must have one or more associated pages. Each site must have one associated error page and one not found page.

So, a page must either belong to a node, or a site as an error or not found page. The solutions I'm currently toying with are:

  1. parentType and parentId fields on the page table, where type would either be "node", "site_error" or "site_notFound" and the id would be the site or node id (whichever is relevant to the type).
  2. nodeId field on the page table that can be null, and then errorPageId and notFoundPageId fields on the site table.

Option #1 ensures that each page belongs to one and only one other entity, although the relationship cant actually be enforced as the parentId field can point to more than one place.

Option #2 is cleaner, but it's basically saying that the site "belongs" to the two error and not found pages, and that's probably bad practice.

Any thoughts or suggestions?
Thanks,
Jack

+3  A: 

Make dummy site nodes for the error or not found page. You can mark them as a specific type of node as per your first option. This will make it easier to make a generic handler mechanism. It will also make joins simpler which will help with database query performance. Additionally, it allows you to add more types of 'special' pages (perhaps a login screen) or make this configurable without having to modify your database schema.

ConcernedOfTunbridgeWells
A: 

Another option is to have 2 columns site_id and site_node_id like this:

create table pages
 ( page_id ... primary key
 , site_id references sites
 , site_node_id references site_nodes
 , ...
 , constraint site_or_node check (  site_id is null and site_node_id is not null
                                 or site_id is not null and site_node_id is null
                                 )
 );

Now you can use referential integrity to ensure that every page belongs to either a site or a node, and not both.

Tony Andrews
+2  A: 

Option #1 ensures that each page belongs to one and only one other entity, although the relationship cant actually be enforced as the parentId field can point to more than one place.

Right. In terms of relational theory, the problem is that your "parentId" column violates third normal form, because its meaning varies per row, based on the value in parentType (a non-key column).

You wouldn't have a properly designed database if a single column could contains someone's phone number or their birthdate, per row, depending on some other flag. Those are two different facts about the person, and they each deserve their own column. Likewise, storing both site_id or node_id in a single column would have the same problem.

Another clue that this is a flawed design is that you can't declare a foreign key constraint to point to either of two referenced tables.

Option #2 is cleaner, but it's basically saying that the site "belongs" to the two error and not found pages, and that's probably bad practice.

I see why you're saying that, because of the belongs to conventions in Rails-like frameworks. But these are conventions; they aren't necessarily the only relationship that foreign keys can model. You can make one entity refer to exactly one other entity, in a has one relationship. In this case, the foreign key reverses direction.

I would say it's logically true that the Error page and the Not Found page belong to the site, not the other way around. And the way to make them mandatory is to have another entity reference these pages, and apply the NOT NULL constraint to these references. This is what you've described.

CREATE TABLE site (
  . . .
  error_page_id     INT NOT NULL,
  notfound_page_id  INT NOT NULL,
  FOREIGN KEY (error_page_id)    REFERENCES pages (page_id),
  FOREIGN KEY (notfound_page_id) REFERENCES pages (page_id)
);

This meets your immediate need, it's enforceable, and it's in Normal Form.


@NXC suggests making dummy nodes for Error and Not Found pages. Although this allows these nodes to be stored in the node hierarchy, it fails to enforce that a site must have these pages. That is, a site could be stored without references to these nodes.

@Tony Andrews suggests storing two columns in each page, site_id and site_node_id, and adding a CHECK constraint to ensure that exactly one of these is non-NULL. This seems better than the parent_id/parent_type option, but it still doesn't offer any enforcement that every site must have an Error and a Not Found page.

Bill Karwin
Thanks, that does make the most sense. As the pages cant be deleted via a cascade delete with that relationship (as the FK is reversed), would you suggest using a trigger to delete the linked error pages, or something else?
Jack Sleight
Probably a trigger if you want to do it automatically. But you might want a given error page to be used by multiple sites, so in that case maybe you don't want it to be automatic.
Bill Karwin
Thanks Bill, going to go with this option. Good point about the trigger, need to decide how I want to do that.
Jack Sleight
What happens when you MUST have term of service page and a registration page and something else page? This sounds like you need a new table "required_pages".
jmucchiello
@joe: Yes I qualified my answer with, "this meets your immediate need." If you have more mandatory pages, of course you need more columns. I don't see a need to store this in a separate table. The `sites` table is not likely to be large, or to be subject to frequent updates.
Bill Karwin
A: 

Option 2 makes much more sense and will save your brain later down the line if and when more complications spring up. The one-to-one relationship of site to error/notfound page makes it perfect for a foreign key constraint.

notnot
A: 

A modification of Option 1.

Include two separate columns, ParentNodeID and ParentSiteID. Leave one of these two columns NULL, depending on the case. Now you can still declare a foreign key (references) constraint for each foreign key.

I don't really understand the SiteNotFound case. Could you leave both foreign keys NULL in this case?

Your joins and searches will be simpler. You will also be adhering to 1NF. This is not a coincidence.

Your option 1 combines values drawn from different domains in a single field. This is bad field design and IIRC violates 1NF.

Walter Mitty