views:

203

answers:

5

I'm working with data that is natively supplied as rational numbers. I have a slick generic C# class which beautifully represents this data in C# and allows conversion to many other forms. Unfortunately, when I turn around and want to store this in SQL, I've got a couple solutions in mind but none of them are very satisfying.

Here is an example. I have the raw value 2/3 which my new Rational<int>(2, 3) easily handles in C#. The options I've thought of for storing this in the database are as follows:

  1. Just as a decimal/floating point, i.e. value = 0.66666667 of various precisions and exactness. Pros: this allows me to query the data, e.g. find values < 1. Cons: it has a loss of exactness and it is ugly when I go to display this simple value back in the UI.

  2. Store as two exact integer fields, e.g. numerator = 2, denominator = 3 of various precisions and exactness. Pros: This allows me to precisely represent the original value and display it in its simplest form later. Cons: I now have two fields to represent this value and querying is now complicated/less efficient as every query must perform the arithmetic, e.g. find numerator / denominator < 1.

  3. Serialize as string data, i.e. "2/3". I would be able to know the max string length and have a varchar that could hold this. Pros: I'm back to one field but with an exact representation. Cons: querying is pretty much busted and pay a serialization cost.

  4. A combination of #1 & #2. Pros: easily/efficiently query for ranges of values, and have precise values in the UI. Cons: three fields (!?!) to hold one piece of data, must keep multiple representations in sync which breaks D.R.Y.

  5. A combination of #1 & #3. Pros: easily/efficiently query for ranges of values, and have precise values in the UI. Cons: back down to two fields to hold one piece data, must keep multiple representations in sync which breaks D.R.Y., and must pay extra serialization costs.

Does anyone have another out-of-the-box solution which is better than these? Are there other things I'm not considering? Is there a relatively easy way to do this in SQL that I'm just unaware of?

+2  A: 

I would probably go with Option #4, but use a calculated column for the 3rd column to avoid the sync/DRY issue (and also means you actually only store 2 columns, avoiding the "three fields" issue).

In SQL server, calculated column is defined like so:

CREATE TABLE dbo.Whatever(
   Numerator INT NOT NULL,
   Denominator INT NOT NULL,
   Value AS (Numerator / Denominator) PERSISTED
)

(note you may have to do some type conversion and verification that Denominator is not zero, etc).

Also, SQL 2005 added a PERSISTED calculated column that would get rid of the calculation at query time.

Chris Shaffer
Hmm... I do like the purity of this: No serialization, no CLR in SQL. I get precision, query-ability, and D.R.Y. Storage size is the only real downside and that isn't that much.
McKAMEY
@Chris: You can use a CASE statement to stop evaluation if the denominator is null - see my answer for details.
OMG Ponies
If I go with a calculated column like this, `Numerator INT NOT NULL, Denominator INT NOT NULL, Value AS CASE WHEN Denominator <> 0 THEN (Numerator / Denominator) ELSE 0 END PERSISTED` what is the data type? Is it going to be a decimal type or is it going to try to truncate it into an `INT`?
McKAMEY
With it as typed, it will probably be an integer. You can cast one of the numbers or multiply by 1.0 to get a decimal (eg, (Numerator * 1.0) / Denominator) or CAST(Numerator AS DECIMAL(14, 5)) [whatever precision you want]).
Chris Shaffer
Additionally, if you want to get rid of the CASE, you could put a CHECK constraint on the Denominator not to be zero; Depends on if it is allowed by your data type or not.
Chris Shaffer
Thanks. These are good refinements.
McKAMEY
A: 

I have experimented a little bit with using the geometry data type in SQL Server 2008 to store and manipulate rational numbers. Basically, I assume that the numerator goes in the X slot and the denominator goes in the Y slot of a fictitious geometry point.

This was good for my needs, but it might be useless for yours. That will depend on what your priorities are (performance, code readability, etc.). I personally found that T-SQL for geometry data manipulation is hard to write and read.

CesarGon
Interesting idea. I'm guessing though that I wouldn't be able to query against it as a single value.
McKAMEY
I am not sure, to be honest. I didn't get that far. And the T-SQL query syntax for geometry is not too straightforward IMHO.
CesarGon
A: 

how much precision are you looking at ? double/float provide decent precision(in my opinion). Am pretty sure scientific/astronomical data need a lot more precision that that. I do know that libraries like matlab and mathematica are good at these. I found that you can use mathematica with your .net program. Here is the link

Edit: adding more links and quotes

"When Mathematica operates on rational numbers, it gives an exact result no matter how many digits are required" from here

Another good read, but you would have to implement it I guess

ram
Good reads; I like the paper. It might be useful for something in the future, but probably not in SQL without going crazy.
McKAMEY
+4  A: 

If you're using SQL Server 2005 or 2008, you have the option to define your own CLR data types:

Beginning with SQL Server 2005, you can use user-defined types (UDTs) to extend the scalar type system of the server, enabling storage of CLR objects in a SQL Server database. UDTs can contain multiple elements and can have behaviors, differentiating them from the traditional alias data types which consist of a single SQL Server system data type.

Because UDTs are accessed by the system as a whole, their use for complex data types may negatively impact performance. Complex data is generally best modeled using traditional rows and tables. UDTs in SQL Server are well suited to the following:

  • Date, time, currency, and extended numeric types
  • Geospatial applications
  • Encoded or encrypted data

If you can live with the limitations, I can't imagine a better way to map data you're already capturing in a custom class.

Michael Petrotta
Interesting idea but CLR in SQL seems overkill to me since I only have two integers. This may be a useful idea for other problems in the future though.
McKAMEY
Why is it overkill? The CLR class doesn't have to be complex.
RickNZ
Requiring CLR in SQL for a very simple class is overkill for my needs.
McKAMEY
+1  A: 

How much precision do you need?

The language, C# or otherwise, will round 2/3rds at a given position in the precision. If it's acceptable for whatever you are working on to use decimal values of say scientific notation of 10, then set the precision accordingly in the db.

If the precision is really a concern, then separate the numerator & denominator. This would ensure you always have access to whatever precision you want, and you can use a computed column to represent the value for quick filtering:

numerator INT,
denominator INT,
result AS CASE WHEN denominator > 0 THEN numerator / denominator ELSE NULL END
OMG Ponies
Part of the point was I shouldn't have to choose how much precision I need for something I can represent with a pair of very small integers. Thanks for the divide-by-zero check.
McKAMEY
@McKAMEY: There are costs involved, depending on what you want to do. Those costs could force you to re-evaluate your needs, and seek compromise.
OMG Ponies
Agreed. Displaying "0.6666667" in the UI for this application isn't acceptable which is why I started down the rational route. Thanks for your help.
McKAMEY