views:

164

answers:

6

Hello, StackOverflow crowd. I have a very open-ended software design question.

I've been looking for an elagant solution to this for a while and I was wondering if anyone here had some brilliant insight into the problem. Consider this to be like a data structures puzzle.

What I am trying to do is to create a unit converter that is capable of converting from any unit to any unit. Assume that the lexing and parsing is already done. A few simple examples:

Convert("days","hours")           // Yields 24
Convert("revolutions", "degrees") // Yields 360

To make things a little more complicated, it must smoothly handle ambiguities between inputs:

Convert("minutes","hours")        // Yields (1/60)
Convert("minutes","revolutions")  // Yields (1/21600)

To make things even more fun, it must handle complex units without needing to enumerate all possibilities:

Convert("meters/second","kilometers/hour")
Convert("miles/hour","knots")
Convert("Newton meters","foot pounds")
Convert("Acre feet","meters^3")

There's no right or wrong answer, I'm looking for ideas on how to accomplish this. There's always a brute force solution, but I want something elegant that is simple and scalable.

A: 

I would start by choosing a standard unit for every quantity(eg. meters for length, newtons for force, etc) and then storing all the conversion factors to that unit in a table

then to go from days to hours, for example, you find the conversion factors for seconds per day and seconds per hour and divide them to find the answer.

for ambiguities, each unit could be associated with all the types of quantities it measures, and to determine which conversion to do, you would take the intersection of those two sets of types(and if you're left with 0 or more than one you would spit out an error)

Bwmat
A good start, but ignores the complexity of the question.
Stargazer712
+4  A: 

I would start with a hashtable (or persisted lookup table - your choice how you implement) that carries unit conversions between as many pairs as you care to put in. If you put in every possible pair, then this is your brute force approach.

If you have only partial pairs, you can then do a search across the pairs you do have to find a combination. For example, let's say I have these two entries in my hashtable:

Feet|Inches|1/12
Inches|Centimeters|2.54

Now if I want to convert feet to centimeters, I have a simple graph search: vertices are Feet, Inches, and Centimeters, and edges are the 1/12 and 2.54 conversion factors. The solution in this case is the two edges 1/12, 2.54 (combined via multiplication, of course). You can get fancier with the graph parameters if you want to.

Another approach might be applying abductive reasoning - look into AI texts about algebraic problem solvers for this...

Edit: Addressing Compound Units

Simplified problem: convert "Acres" to "Meters^2"

In this case, the keys are understanding that we are talking about units of length, so why don't we insert a new column into the table for unit type, which can be "length" or "area". This will help performance even in the earlier cases as it gives you an easy column to pare down your search space.

Now the trick is to understand that length^2 = area. Why not add another lookup that stores this metadata:

Area|Length|Length|*

We couple this with the primary units table:

Meters|Feet|3.28|Length
Acres|Feet^2|43560|Area

So the algorithm goes:

  • Solution is m^2, which is m * m, which is a length * length.
  • Input is acres, which is an area.
  • Search the meta table for m, and find the length * length mapping. Note that in more complex examples there may be more than one valid mapping.
  • Append to the solution a conversion Acres->Feet^2.
  • Perform the original graph search for Feet->M.

Note that:

  • The algorithm won't know whether to use area or length as the basic domain in which to work. You can provide it hints, or let it search both spaces.
  • The meta table gets a little brute-force-ish.
  • The meta table will need to get smarter if you start mixing types (e.g. Resistance = Voltage / Current) or doing something really ugly and mixing unit systems (e.g. a FooArea = Meters * Feet).
Greg Harman
I've done this approach, and it works well, but you need to take care to not traverse a conversion path with a previously converted amount, since the errors accumulate pretty rapidly.
codekaizen
Agreed, you have to avoid loops and things can get even more complex if you allow parallel edges (not sure how that applies to a straight unit conversion, but maybe there are e.g. callouts to external functions with different levels of accuracy).
Greg Harman
This answer is getting close, but would still have trouble with the Acre feet -> meters^3 solution. There's no assumption that the units will be in the same "style".
Stargazer712
Added an edit to start to address this - that rabbit hole goes pretty deep...
Greg Harman
Thank you for your some of your insight (as well as everyone here). Its nice to sit back and "take it in". Like I said, there's no right or wrong answer, but it is nice to hear how others look at the problem.
Stargazer712
I found this by accident and happen to be writing an app that needs to do this. Great info, thanks a bunch +1
Tom Dignan
A: 

I assume that you want to hold the data about conversion in some kind of triples (fstUnit, sndUnit, multiplier).

For single unit conversions: Use some hash functions in O(1) to change the unit stucture to a number, and then put all multipliers in a matrix (you only have to remember the upper-right part, because the reflection is the same, but inversed).

For complex cases: Example 1. m/s to km/h. You check (m,km) in the matrix, then the (s,h), then multiply the results. Example 2. m^3 to km^3. You check (m,km) and take it to the third power.

Of course some errors, when types don't match like field and volume.

Alistra
A: 

You can make a class for Units that takes the conversion factor and the exponents of all basic units (I'd suggest to use metric units for this, that makes your life easier). E.g. in Pseudo-Java:


public class Unit {
   public Unit(double factor, int meterExp, int secondExp, int kilogrammExp ... [other base units]) {
     ...
   }
}

//you need the speed in km/h (1 m/s is 3.6 km/h):
Unit kmPerH = new Unit(1 / 3.6, 1, -1, 0, ...)

Landei
+2  A: 

Whatever structure you choose, and your choice may well be directed by your preferred implementation (OO ? functional ? DBMS table ?) I think you need to identify the structure of units themselves.

For example a measurement of 1000km/hr has several components:

  • a scalar magnitude, 1000;
  • a prefix, in this case kilo; and
  • a dimension, in this case L.T^(-1), that is, length divided by time.

Your modelling of measurements with units needs to capture at least this complexity.

As has already been suggested, you should establish what the base set of units you are going to use are, and the SI base units immediately suggest themselves. Your data structure(s) for modelling units would then be defined in terms of those base units. You might therefore define a table (thinking RDBMS here, but easily translatable into your preferred implementation) with entries such as:

unit name      dimension                 conversion to base

foot           Length                    0.3048
gallon(UK)     Length^3                  4.546092 x 10^(-3)
kilowatt-hour  Mass.Length^2.Time^(-2)   3.6 x 10^6

and so forth. You'll also need a table to translate prefixes (kilo-, nano-, mega-, mibi- etc) into multiplying factors, and a table of base units for each of the dimensions (ie meter is the base unit for Length, second for Time, etc). You'll also have to cope with units such as feet which are simply synonyms for other units.

The purpose of dimension is, of course, to ensure that your conversions and other operations (such as adding 2 feet to 3.5 metres) are commensurate.

And, for further reading, I suggest this book by Cardarelli.

EDIT in response to comments ...

I'm trying to veer away from suggesting (implementation-specific) solutions so I'll waffle a bit more. Compound units, such as kilowatt-hours, do pose a problem. One approach would be to tag measurements with multiple unit-expressions, such as kilowatt and hour, and a rule for combining them, in this case multiplication I could see this getting quite hairy quite quickly. It might be better to restrict the valid set of units to the most common ones in the domain of the application.

As to dealing with measurements in mixed units, well the purpose of defining the Dimension of a unit is to provide some means to ensure that only sensible operations can be applied to measurements-with-units. So, it's sensible to add two lengths (L+L) together, but not a length (L) and a volume (L^3). On the other hand it is sensible to divide a volume by a length (to get an area (L^2)). And it's kind of up to the application to determine if strange units such as kilowatt-hours per square metre are valid.

Finally, the book I link to does enumerate all the possibilities, I guess most sensible applications with units will implement only a selection.

High Performance Mark
I like the separation of the unit from the "meta-units". On the other hand, this one strays toward the idea of enumerating all possible options. For example: kilowatt-hour, kilowatt-minute, kilowatt-second, kilowatt-day, kilowatt-week, etc. should all work, even if they don't necessarily make sense. That being said, this one also suffers from the complexity of Acre foot -> Cubic meter conversion.
Stargazer712
Nice answer, really gets to the core need of separating numerical values, unit classes (distance, time, etc.) and particular units (cm, s, ...).
Dave
Your system, as You describe it, is really intuitive for atomic units. How would You attempt to do arithmetic operations where compound units are included (watt/s, kg/m^3, etc)? Be so kind and give us some random thoughts High Performance Mark.
Dave
A: 

I would have a table with these fields:

conversionID
fromUnit
toUnit
multiplier

and however many rows you need to store all the conversions you want to support

If you want to support a multi-step process (degrees F to C), you'd need a one-to-many relationship with the units table, say called conversionStep, with fields like

conversionID
sequence
operator
value

If you want to store one set of conversions but support multi-step conversions, like storing

Feet|Inches|1/12
Inches|Centimeters|2.54

and supporting converting from Feet to Centimeters, I would store a conversion plan in another table, like

conversionPlanID
startUnits
endUnits
via

your row would look like

1 | feet | centimeters | inches
Beth
The only problem with that is that for N units, you would need to have (N choose 2) fields in the table. Not only is that memory intensive (complexity O(n^2)), but it is a fairly rigid data structure.
Stargazer712
you could add an inverseMultiplier field, if you didn't want to store the reverse combination again, but a single multiplier is all I'm seeing in your example, so I'm not sure what other flexibility you need.
Beth
@stargazer712 you mean rows in the table?
Beth
@Beth, yes rows in the table. For example, with how you originally described it, adding inches would necessitate the need for a conversion from inches/hour -> km/hour, inches/second -> km/hour, Acre inches -> Acre feet, Acre inches -> Meters^3. I said it would be rigid, because units can be anything. inches*feet/radians might have meaning to someone, and you should be able to handle this unit without needing an entry for every possibility.
Stargazer712