views:

146

answers:

8

Hello, I have a table representing standards of alloys. The standard is partly based on the chemical composition of the alloys. The composition is presented in percentages. The percentage is determined by a chemical composition test. Sample data.

But sometimes, the lab cannot measure below a certain percentage. So they indicate that the element is present, but the percentage is less than they can measure. I was confused how to accurately store such a number in an SQL database. I thought to store the number with a negative sign. No element can have a negative composition of course, but i can interpret this as less than the specified value. Or option is to add another column for each element!! The latter option i really don't like. Any other ideas? It's a small issue if you think about it, but i think a crowd is always wiser. Somebody might have a neater solution.


Question updated:

Thanks for all the replies.

  • The test results come from different labs, so there is no common lower bound.
  • The when the percentage of Titanium is less than <0.0004 for example, the number is still important, only the formula will differ slightly in this case.

Hence the value cannot be stored as NULL, and i don't know the lower bound for all values. Tricky one.

Another possibility i thought of is to store it as a string. Any other ideas?

+6  A: 

What you're talking about is a sentinel value. It's a common technique. Strings in most languages after all use 0 as a sentinel end-of-string value. You can do that. You just need to find a number that makes sense and isn't used for anything else. Many string functions will return -1 to indicate what you're looking for isn't there.

0 might work because if the element isn't there there shouldn't even be a record. You also face the problem that it might be mistaken for actually meaning 0. -1 is another option. It doesn't have that same problem obviously.

Another column to indicate if the amount is measurable or not is also a viable option. The case for this one becomes stronger if you need to store different categories of trace elements (eg <1%, <0.1%, <0.01%, etc). Storing the negative of those numbers seems a bit hacky to me.

cletus
"0" as a sentinel seems reasonable, because for many calculations that's an acceptable value.
MSalters
+1  A: 

I would have a table modeling the certificate, in a one to many relation with another table, storing the values for elements. Then, I would still have the elements table containing the value in one column and a flag (less than) as a separate column.

Draft:

create table CERTIFICATES
(
  PK_ID integer,
  NAME varchar(128)
)

create table ELEMENTS
(
  ELEMENT_ID varchar(2),
  CERTIFICATE_ID integer,
  CONCENTRATION number,
  MEASURABLE integer
)

Depending on the database engine you're using, the types of the columns may vary.

Cătălin Pitiș
+2  A: 

I would use a threshold value which is at least one significant digit smaller than your smallest expected value. This way you can logically say that any value less than say 0.01, can be presented to you application as a "trace" amount. This remains easy to understand and gives you flexibility in determining where your threshold should lie.

Programming Hero
+3  A: 

You could just store it as NULL, meaning that the value exists but is undefined.

Any arithmetic operation with a NULL yields a NULL.

Division by NULL is safe.

NULL's are ignored by the aggregation functions, so queries like these:

SELECT  SUM(metal_percent), COUNT(metal_percent)
FROM    alloys
GROUP BY
        metal

will give you the sum and the count of the actual, defined values, not taking the unfilled values into account.

Quassnoi
You'd confuse "NULL=not measured" with "NULL=measured but below the capabilities of the equipment"
MSalters
`@MSalters`: good point. But as noted above, the records should not even get into the table in this case. If the table design does not allow it, then yes, another column is required ("reason for value absence")
Quassnoi
`@MSalters`: in case of a zero, we could confuse "no traces of metal" and "traces of metal we couldn't measure".
Quassnoi
+2  A: 

Why not add another column to store whether or not its a trace amount

This will allow you to to save the amount that the trace is less than too

John McG
A: 

Another field seems like the way to go; call it 'MinMeasurablePercent'.

Noon Silk
+1  A: 

Since there is no common lowest threshold value and NULL is not acceptable, the cleanest solution now is to have a marker column which indicates whether there is a quantifiable amount or a trace amount present. A value of "Trace" would indicate to anybody reading the raw data that only a trace amount was present. A value of "Quantity" would indicate that you should check an amount column to find the actual quantity present.

I would have to warn against storing numerical values as strings. It will inevitably add additional pain, since you now lose the assertions a strong type definition gives you. When your application consumes the values in that column, it has to read the string to determine whether it's a sentinel value, a numeric value or simply some other string it can't interpret. Trying to handle data conversion errors at this point in your application is something I'm sure you don't want to be doing.

Programming Hero
+1  A: 

Since the constraints of the values are well defined (cannot have negative composition), I would go for the "negative value to indicate less-than" approach. As long as this use of such sentinel values are sufficiently documented, it should be reasonably easy to implement and maintain.

An alternative but similar method would be to add 100 to the values, assuming that you can't get more than 100%. So <0.001 becomes 100.001.

lsc