views:

109

answers:

3

I have the following db-schema .

FILE, GROUP and BLOCK represent the object structure of the XML file. FILE is the root. GROUP has FK to FILE. BLOCK has the one FK to GROUP and the another one FK to UNIT.

UNIT groups "similar" BLOCKs from diffrent GROUPs in the context of FILE.

The data base is currently in 3NF. Yet I would like to know which UNITs belong to FILE.id=1. To do this yet, I have to make a query which joins all 4 tables. To optimize this schema, I can create the new relation UNIT n--FK-->1 FILE. Yet my query joins only two tables on the optimized db-schema. And here is the question: is this DB(with this new FK) still in 3 NF ? What the theory says?

BLOCK  n--FK-->1  GROUP  n--FK-->1  FILE
 n 
 |
 FK    
 |    
 1  
Unit

or

            +--------+
      +-----|  File  |.....+
      |     +--------+     .
      |                    .
     /|\                  /.\
 +--------+           +--------+
 | Group  |--+     +--|  Unit  |
 +--------+  |     |  +--------+
             |     |
            /|\   /|\
           +---------+
           |  Block  |
           +---------+
A: 

It is not clear how the UNIT table fits into the schema before you make changes.

Obviously, after you make changes, all you have to do to know which units belong to a file is join the FILE and UNIT tables.

Since tables are in 3NF when all the functional dependencies are determined by the keys, the whole keys, and nothing but the keys (so help me Codd), you have to look at your schema in that light.

Given the available information, most likely the tables are all in 3NF (and BCNF, and AFAICT in 4NF and 5NF too).

Jonathan Leffler
So help me Codd, I don't think you read the question very carefully.
NealB
A: 

I don't think your "crows foot" diagram supports the other dependencies outlined in your question. How did you come up with the 1:Many relationship between FILE and UNIT?

These are the functional dependencies that you describe...

  • GROUP -> FILE
  • BLOCK -> GROUP
  • BLOCK -> UNIT

Also, I assume that each of the above attributes functionally determine some additional attributes not appearing on the left hand side of any other functional dependency. These would be:

  • FILE -> other-file-attributes
  • GROUP -> other-group-attributes
  • BLOCK -> other-block-attributes
  • UNIT -> other-unit-attributes

Constructing a set of 3NF relations from the above functional dependencies gives:

  • FileRelation: (FILE, other-file-attributes)
  • GroupRelation: (GROUP, FILE, other-group-attributes)
  • UnitRelation: (UNIT, other-unit-attributes)
  • BlockRelation: (BLOCK, GROUP, UNIT, other-block-attributes)

This pretty much corresponds to what you have described.

Determining which UNIT instances relate to a given FILE requires a join of FileRelation to GroupRelation on FILE and then GroupRelation to BlockRelation on GROUP then BlockRelation to UnitRelation on UNIT.

You want to avoid this multi-table join by inserting a new relationship somewhere in the model that gives a direct mapping from UNIT to FILE. Such a relation implies the functional dependency:

  • UNIT -> FILE

This looks like the bit you added to the "crows foot" diagram. Adding this introduces a logical contradiction. The original schema supports having a given UNIT relating to multiple FILE instances. as in:

  • FileRelation(F1, ...)
  • FileRelation(F2, ...)
  • GroupRelation(G1, F1, ...)
  • GroupRelation(G2, F2, ...)
  • BlockRelation(B1, G1, U1, ...)
  • BlockRelation(B2, G2, U1, ...)
  • UnitRelation(U1, ...)

UNIT instance U1 relates to FILE instances F1 and F2. Given this situation either the UNIT -> FILE functional dependency cannot be supported or the original set of functional dependencies were incomplete and the schema is not in 3NF.

At this point you need to resolve whether the real world supports the FILE -> UNIT dependency or not. If it does, then the original model is not in 3NF and a bit more reworking of the schema is in order. If the dependency is not supported then the best you can say is:

  • FILE, UNIT -> nothing

and the corresponding relation:

  • FileUnit: (FILE, UNIT)

is a de-normalization because its content may be derived through existing tables functional dependancies.

=================================================================================

EDIT

Based on a number of comments made to this and other answers, it appears that:

  • UNIT -> FILE

is a true functional dependency, the functional dependency:

  • BLOCK -> UNIT

while not incorrect, must be redundant. I believe the correct 3NF set of relations for this model now is:

  • FileRelation: (FILE, other-file-attributes)
  • GroupRelation: (GROUP, FILE, other-group-attributes)
  • UnitRelation: (UNIT, FILE, other-unit-attributes)
  • BlockRelation: (BLOCK, GROUP, other-block-attributes)

Notice that the UNIT foreign key has dropped from the BlockRelation. This is because the UNIT -> FILE FD made it redundant. The (BLOCK, UNIT) relation is now formed by joining UnitRelation to FileRelation on FILE then FileRelation to GroupRelation on FILE then GroupRelation to BlockRelation on GROUP.

The original schema was not in 3NF due to the unstated functional dependency: UNIT -> FILE. The proposed set of relations above is in 3NF.

Note: When normalizing a schema, every functional dependency needs to be stated up front. Missing one can change the whole picture!

NealB
@NealB, I added the "crows-foot" diagram as an edit - the additional relationship (added as a dotted line) between FILE and UNIT comes from the original question's sentence, "To optimize this schema, I can create the new relation UNIT n--FK-->1 FILE".
Mark Bannister
@Mark Bannister: The crows-foot diagram corresponds to the "optomized" schema the OP is asking about so it makes no material difference to my answer.
NealB
@Mark Bannister: I updated my answer based on the determination that UNIT -> FILE is a true functional dependency.
NealB
@NealB, by removing the Block -> Unit dependancy as redundant, you are now stating that for a File, all Blocks that belong to that File (ie. all Blocks that belong to Groups that belong to that File) are associated with all Units that belongs to that file. However, from the OP's sample data, this is not so - Blocks 1 and 2 are only associated with Unit 1, while Block 3 is only associated with Unit 2. Therefore, the Block -> Unit dependancy is not redundant, and the OP's proposed schema is correctly normalised.
Mark Bannister
+1  A: 

From the information supplied, it appears that this is a true parallel hierarchy. On this basis, I believe that the proposed amended schema would still be normalised to 3NF.

Mark Bannister
The original model contained FD's such that a many:many relationshipexists between FILE and UNIT (by virtue of the BLOCK junction). The revised model shows a 1:many relationship between FILE and UNIT. This is a contradiction - something is wrong. Suppose the FILE/UNIT relationship is many:many. This information is already contained in the model, **unless there exist some set of attributes that depend entirely upon the specific FILE/UNIT instance**. If this is not the case, the new model contains redundant relationships and is no longer 3NF.
NealB
@NealB, the existing relationships imply, but do not force, a many to many relationship between FILE and UNIT. The addition of the File ID foreign key resolves this relationship to a **one to many** relationship - at this point, any UNIT can *only* have one FILE. This agrees with the (admittedly small) sample of data supplied by the OP in the fourth comment to the question.
Mark Bannister
I see your point - I was distracted by the term "optimized" and assumed the original schema already complete and 3NF with respect to all FD's. From this point of view, adding new relationships must be de-normalizing. Looks like the OP was completing the schema by adding the **UNIT** -> **FILE** FD. As you point out, adding this resolved the Many:Many **UNIT** to **FILE** relationship implied in the original model into a 1:Many. Good work +1
NealB