A pure regular expression solution is a bit inelegant, because it has to repeat captures multiple times to accomodate multiplication and division:
(?x)
( # Capture 1: the entire matched composed unit string
( # Capture 2: one unit including the optional prefix
(Y|Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)? # Capture 3: optional prefix, taken from http://en.wikipedia.org/wiki/SI_prefix
(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|l|L) # Capture 4: Base units and derived units w/o °C, rad and sr, but with L/l for litre
(\^[+-]?[1-9]\d*)? # Capture 5: Optional power with optional sign. \^0 and \^-0 are not permitted
| # or
1 # one permitted, e.g. in 1/s
)
(?: # Zero or more repetitions of one unit, plus the multiplication sign
·( # Capture 6: one unit including the optional prefix
(Y|Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)? # Capture 7
(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|l|L) # Capture 8
(\^[+-]?[1-9]\d*)? # Capture 9
| # or
1 # one permitted, e.g. in 1/s
)
)*
(?: # Optional: possibly multiplied units underneath a denominator sign
\/( # Capture 10
(Y|Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)? # Capture 11
(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|l|L) # Capture 12
(\^[+-]?[1-9]\d*)? # Capture 13
| # or
1 # one permitted, e.g. in 1/s
)
(?: # Zero or more repetitions of one unit, plus the multiplication sign
·( # Capture 14
(Y|Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)? # Capture 15
(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|l|L) # Capture 16
(\^[+-]?[1-9]\d*)? # Capture 17
| # or
1 # one permitted, e.g. in 1/s
)
)*
)?
)
I have included the litre as a unit, even though it is not an SI unit. I also require the standard multiplication sign. You may modify this if needed. If you construct the regular expression from several base strings, it becomes much easier to grasp:
prefix = "(Y|Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)"
unit = "(m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|l|L)"
power = "(\^[+-]?[1-9]\d*)"
unitAndPrefix = "(" + prefix + "?" + unit + power + "?" + "|1" + ")"
multiplied = unitAndPrefix + "(?:·" + unitAndPrefix + ")*"
withDenominator = multiplied + "(?:\/" + multiplied + ")?"
The regular expression does not do any consistency checking, of course, it also accepts such things like kg^-1·kg^-1·1/kg^-2 as valid.
Of course, you can modify the regular expression as required, e.g. by using *
as the multiplication character, etc.