views:

108

answers:

4

Suppose you want to write into a database that something is 30 meters long, or 50 feet, or the temperature was 50 kelvin, the speed was 50 kilometers per hour. How would you represent the units ?

To clarify, two points:

  • any kind of units, not a predefined, well defined subset of them.
  • my question is more relative to the existence of an ontology of units. I took the database example because it was the first that crossed my mind, but scenarios like representing the unit in XML or JSON are equally likely.
A: 

Do you have a specific reason to store quantities in different types of units, instead of converting into some "canonical" units (e.g., the metric system)? When inserting data, you'd convert the input quantity into the canonical unit. And when reading data, you'd convert into whatever output unit you need.

This approach is simpler in many ways than storing data in different units, but you lose the information about the original unit in which the data was specified.

Igor ostrovsky
That's the problem. I need to preserve the information about the original unit.
Stefano Borini
A: 

I would include the units in the column name (e.g. LengthInMeters, WeightInKilograms, AnnoyingnessInFishSlapsPerSecond etc.), and then just store the numbers in the column.

Ideally, it would be nice to be able to define the unit as a (proper) property of the column, but I don't know of any database that allows this. With the unit included in the column name, it's difficult for future developers to become confused about this.

I've run into DB solutions that include the unit in a second column, but since there's no standardized way of representing units, this ends up being either a text field with values like "ft.", "feet" "Feet" etc., or else an FK to a table that stores possible units (also text). Either way, running SUM or AVG queries (or any calculation) becomes a nightmare, especially if you allow values with different units to be stored in the same column.

MusiGenesis
That's not easy as the user can define any arbitrary unit. I think that my question boils down to the existence (or lack thereof) of an ontology of units.
Stefano Borini
+4  A: 

One of the fundamental concepts of relational database design is that all values in a given column should represent some logically compatible type of data. Formally, a column has exactly one single type, and any two values in a type can be compared to each other in an equality predicate. This is a crucial part of type theory.

So if the measurements are not comparable, i.e. length vs. temperature, you shouldn't store them in the same column.

You might want to look at ISO 2955, "Information processing - Representation of SI and other units in Systems with limited Character sets."

Also see "Joe Celko's SQL Programming Style," chapter 4, Scales and Measurements.

Bill Karwin
Interesting. Very interesting. Thanks
Stefano Borini
+1  A: 

Relational theory has it that each relvar ("table") has an associated predicate that defines the meaning of the tuples therein. That predicate ought to be part of the formal documentation of the database, such that no one who actually consults the documentation can have any excuse for "having misunderstood something" (unless the documentation is incomplete of course).

Including the definition of units in that predicate (e.g. "The length of person ... is FEET.", "The measured temperature was ... KELVIN", ...) achieves that completeness and avoids having to resort to those rather ugly attribute ("column") names.

I don't understand why "just storing the numbers" (in a standard unit that is agreed upon by all users) would be "not easy".

If foobaricity exists as a unit, and someone comes up with a new unit fluffyperception, then that someone will first have to formally establish the correspondance between quantities of foobaricity and quantities of fluffyperception anyway, or nothing he states will/can be understood by anyone.

EDIT

I saw this added : "I need to preserve the information about the original unit."

Nothing stops you from doing that. Two extra columns (original quantity and original unit name) alongside the "canonicalized" value. You can constrain "original unit name" as strong or as lax as you want.

Erwin Smout