views:

117

answers:

3

See 'Almost Decided' below


NOTE: This is a continuation and simplification of Optional Database Entities.

I am designing a new laboratory database that tests a wide variety of tests on a wide variety of sample types.

Main Entities:
  Each must have exactly one parent (except REQ) and at least one child (except MEA).

 Request      REQ - the form
 Sample       SAM - the materials on the form to be tested
 Test         TST - the procedures to be performed on the sample
(Trial **     TRI - instance of duplicate methods for statistics)
 Measurement  MEA - a single measured number
 ** A Trial is optional. (see below)

Optional Trial Explanation
Many tests are a simple procedure with a few measurements. For example, "Add 10 mL 15% of KNO3 to sample then obtain density and pH."

However, some tests call for the same procedure to be performed on distinct portions of a sample. Lets use ballistics testing as an example. The requester might ask for the average exit velocity and accuracy for these 20 bullets. The sample is the set of 20 bullets. The test is "collect exit velocity and accuracy." The trials are the 20 individual shots being fired. The measurements are exit velocity and accuracy for each shot (trial).

QUESTION
How should I model the entities Test, Trial, and Measurement, since the Trial entity is optional?

Option 1: Use a "blank" trial entity as a placeholder if not needed.
Good: Parent entity always the same.
Bad: Trial entries exist even when not needed.

Option 2: Roll Trial into the Test table as a sub-test. A measurement would then always have a test as a parent.
Good: Single parent type for measurement (Test)
Bad: Multiple parent type for Test: Sample or Test

Option 3: A measurement still has one parent, but the parent could be either a test or a trial.
Good: Single parent type for Test (and Event when required)
Bad: Multiple parent type for Measurement: Test or Trial

Option 4: Trial as sub-entity. Measurement has required test_id and optional trial_num. Trial has a PK of (test_id, trial_num).
Good: No multiple parent type.
Bad: Not sure

Option X: Any other option not already mentioned.



Almost Decided: I now believe Option 4 (Trial as sub-entity) is the best. The following are the basic rules for Option 4. - A measurement always belongs to a test. - A trial only exists when needed. - Trial_num is set when multiple trials exist under a set. - Otherwise trial_num is null to indicate a trial is not needed.

Simple ER Diagram
-----------------
REQ <- SAM <- TST <- MEA
              ^        |  
              |        |  
              |-(TRI)<-|    

Table Keys
----------

 Table | PK              | FK
 ------+-----------------+----------------
 REQ   | REQ_id          |  
 SAM   | SAM_id          | REQ.PK
 TST   | TST_id          | SAM.PK
(TRI   | TST_id, TRI_num | TST.PK )
 MEA   | MEA_id          | TST.PK, TRI.PK*

* TRI.PK is null if trial entity is not needed.

Please offer any thoughts on why it is a good or bad option.

+1  A: 

From what I can gather from your explanation, this can be said:

A Test may have one or many Trials assocaited to it and a Trial may only be associated with one Test.

In which case, a Trial is a child entity of a Test. If that is true, then you might have two tables:

Test
Trial

The Trial table would have a foreign key field back to the Test table (which signifies the relationship). That way, each Trial would be associated with exactly one Test and each Test could have multiple related Trials.

Justin Niessner
Can a Measurement belong to more than one Test or Trial?
Justin Niessner
No. Every entity (except request) has exactly one parent and (except measurement) has at least one child.
Steven
Then it's possible to use a trick used by Microsoft in Microsoft Project. Your Measurement table would have two fields to identify the relationship. One would be parentId, and then a parentTypeId. parentId would be the ID of the parent element and parentTypeId would be an identifier to signify whether the parent is a Test or Trial (or any other type you need).
Justin Niessner
First, I would implement "exactly one parent in `Test` or `Trial`" in `Measurement` by having two FK,nullable columns `test_id` and `trial_id` with a check constraint on `test_id NOT NULL XOR trial_id NOT NULL`.
Steven
Second, you believe my "Option 3" (not 4) is best?
Steven
Partially. I wouldn't implement it the way you are talking though. Your method results in one field or the other having a null value. What I suggest leaves no nulls. parentId would be the id value of any kind of parent, and the parentIdType would be a flag telling you which entity the id belonged to.
Justin Niessner
@Steven - if every `Test` must have at least one `Trial`, why wouldn't you just make `Measurement` a child of `Trial`?
Jeff Sternal
Not all `Tests` need a `Trial`. An example is in my main question, "Optional Trial Explanation", first paragraph. I could force every `Test` to have a `Trial`, which is my Option 1.
Steven
+1  A: 

It isn't clear why it's necessary to represent a Trial as a distinct entity. Why isn't the following schema adequate?

Sample        Test           Measurement
------        ----           -----------
SampleId (PK) TestId (PK)    MeasurementId (PK)
Description   SampleId (FK)  TestId (FK)
              TestStartDate  Description
              TestEndDate    MeasuredValue

From what you've said so far, it sounds like you could just infer that a Test with more than one Measurement counts as a Trial. That is, if your user interface needs to show that a particular Test had a Trial, if could just do the following:

if (test.Measurements.Count > 1) {
    _View.Title = test.TestName + " (Trial)"; 
}

If that's not true, what attributes do you need that this schema is missing? What else needs to be in the Trial table that isn't available here?


Update: given the additional details, I would recommend introducing a new entity, which I'll call TestRun. A TestRun just groups one or more Measurements within a Test. Trials are now associated with TestRuns.

The resulting schema looks like this:

Sample        Test           TestRun         Measurement
------        ----           -------         -----------
SampleId (PK) TestId (PK)    TestRunId (PK)  MeasurementId (PK)
Description   SampleId (FK)  TestId (FK)     TestRunId (FK)
              TestStartDate                  Description
              TestEndDate                    MeasuredValue

Trial
-----
TrialId (PK)
TestRunId (FK)
Description

This is very close to Option 1 in the original question. If the cost of maintaining a blank (or dummy) Trial are low - for example, if Trials have few or no attributes - that may be a better solution.

Jeff Sternal
In those tests where a set of measurements is taken multiple times, a trial is a single set of those measurements. Examples: record temperature and pressure every 5 minutes for an hour, record velocity and accuracy of 30 bullets when shot, etc. In contrast, simple tests (as described in the original question) have multiple measurements but do not require multiple trials.
Steven
In a way, a `trial` is a member of an array of a particular set of measurements. In many cases, a `specification` must compare against the average (or other function) of a particular `measurement` rather than the individual `measurements`.
Steven
A: 

Here's a kite, borne aloft by assumption and interpretation.

A request comes with a sample and a specification of procedures to be carried out on the sample. If the sample is singular (a bottle of liquid) then this is a test and one set of measurements is collected. If the sample is multiple (a box of bullets, a time series) then this is a trial and a repeated set of measures is collected, one per sample instance.

In other words, the test/trial dichotomy is a red herring, and measurement is an intersection table between test and sample.

REQUEST
-------
RequestId (PK)
Specification

SAMPLE
------
RequestId (FK)
SampleId (PK)

TEST
----
RequestId (FK)
TestId (PK)
TestStartDate
TestEndDate

MEASUREMENT
-----------
TestId (FK)   
SampleId (FK) 
MeasurementDescription
MeasuredValue

Measurement(TestId,SampleId) is a compound unique key. Depending on how you feel about composites you may wish to define a surrogate MeasurementId (PK) as well.

Obviously there may be some further normalisation may be required. For example, you may want/need to split sample into two tables sample and sample_instance. It's hard to tell without knowing all the attribues involved.

APC