views:

125

answers:

3

Problem:

A relational database (Postgres) storing timeseries data of various measurement values. Each measurement value can have a specific "measurement type" (e.g. temperature, dissolved oxygen, etc) and can have specific "measurement units" (e.g. Fahrenheit/Celsius/Kelvin, percent/milligrams per liter, etc).

Question:

Has anyone built a similar database such that dimensional integrity is conserved? Have any suggestions?

I'm considering building a measurement_type and a measurement_unit table, both of these would have text two columns, ID and text. Then I would create foreign keys to these tables in the measured_value table. Text worries me somewhat because there's the possibility for non-unique duplicates (e.g. 'ug/l' vs 'µg/l' for micrograms per liter).

The purpose of this would be so that I can both convert and verify units on queries, or via programming externally. Ideally, I would have the ability later to include strict dimensional analysis (e.g. linking µg/l to the value 'M/V' (mass divided by volume)).

Is there a more elegant way to accomplish this?

A: 

"Text worries me somewhat because there's the possibility for non-unique duplicates"

Right. So don't use text as a key. Use the ID as a key.

"Is there a more elegant way to accomplish this?"

Not really. It's hard. Temperature is it's own problem because temperature is itself an average, and doesn't sum like distance does; plus F to C conversion is not a multiply (as it is with every other unit conversion.)

S.Lott
+2  A: 

I produced a database sub-schema for handling units an aeon ago (okay, I exaggerate slightly; it was about 20 years ago, though). Fortunately, it only had to deal with simple mass, length, time dimensions - not temperature, or electric current, or luminosity, etc. Rather less simple was the currency side of the game - there were a myriad different ways of converting between one currency and another depending on date, currency, and period over which conversion rate was valid. That was handled separately from the physical units.

Fundamentally, I created a table 'measures' with an 'id' column, a name for the unit, an abbreviation, and a set of dimension exponents - one each for mass, length, time. This gets populated with names such as 'volume' (length = 3, mass = 0, time = 0), 'density' (length = 3, mass = -1, time = 0) - and the like.

There was a second table of units, which identified a measure and then the actual units used by a particular measurement. For example, there were barrels, and cubic metres, and all sorts of other units of relevance.

There was a third table that defined conversion factors between specific units. This consisted of two units and the multiplicative conversion factor that converted unit 1 to unit 2. The biggest problem here was the dynamic range of the conversion factors. If the conversion from U1 to U2 is 1.234E+10, then the inverse is a rather small number (8.103727714749e-11).

The comment from S.Lott about temperatures is interesting - we didn't have to deal with those. A stored procedure would have addressed that - though integrating one stored procedure into the system might have been tricky.

The scheme I described allowed most conversions to be described once (including hypothetical units such as furlongs per fortnight, or less hypothetical but equally obscure ones - outside the USA - like acre-feet), and the conversions could be validated (for example, both units in the conversion factor table had to have the same measure). It could be extended to handle most of the other units - though the dimensionless units such as angles (or solid angles) present some interesting problems. There was supporting code that would handle arbitrary conversions - or generate an error when the conversion could not be supported. One reason for this system was that the various international affiliate companies would report their data in their locally convenient units, but the HQ system had to accept the original data and yet present the resulting aggregated data in units that suited the managers - where different managers each had their own idea (based on their national background and length of duty in the HQ) about the best units for their reports.

Jonathan Leffler
A: 

A note about conversions: a lot of units are linearly related, and can be converted using a formula like "y = A + Bx", where A and B are constants which could be stored in the database for each pair of units that you need to convert between. For example, for Celsius to Farenheit the constants are A=32, B=1.8.

However, there are also rare exceptions. Converting between logarithmic and non-logarithmic units, for example. Or converting between mass-per-volume and molar-mass-per-volume (in which case you would need to know the molar mass of the compound being measured).

Of course, if you are sure that all the conversions required by the system are linear, then there's no need for over-engineering, just store the two constants. You can then extract standardized results from the database using straight SQL joins with calculated fields.

Todd Owen