views:

86

answers:

1

I have a question about the SQL standard which I'm hoping a SQL language lawyer can help with.

Certain expressions just don't work. 62 / 0, for example. The SQL standard specifies quite a few ways in which expressions can go wrong in similar ways. Lots of languages deal with these expressions using special exceptional flow control, or bottom psuedo-values.

I have a table, t, with (only) two columns, x and y each of type int. I suspect it isn't relevant, but for definiteness let's say that (x,y) is the primary key of t. This table contains (only) the following values:

x    y
7    2
3    0
4    1
26   5
31   0
9    3

What behavior is required by the SQL standard for SELECT expressions operating on this table which may involve division(s) by zero? Alternatively, if no one behavior is required, what behaviors are permitted?

For example, what behavior is required for the following select statements?

The easy one:

SELECT x, y, x / y AS quot
FROM t

A harder one:

SELECT x, y, x / y AS quot
FROM t
WHERE y != 0

An even harder one:

SELECT x, y, x / y AS quot
FROM t
WHERE x % 2 = 0

Would an implementation (say, one that failed to realize on a more complex version of this query that the restriction could be moved inside the extension) be permitted to produce a division by zero error in response to this query, because, say it attempted to divide 3 by 0 as part of the extension before performing the restriction and realizing that 3 % 2 = 1? This could become important if, for example, the extension was over a small table but the result--when joined with a large table and restricted on the basis of data in the large table--ended up restricting away all of the rows which would have required division by zero.

If t had millions of rows, and this last query were performed by a table scan, would an implementation be permitted to return the first several million results before discovering a division by zero near the end when encountering one even value of x with a zero value of y? Would it be required to buffer?

There are even worse cases, ponder this one, which depending on the semantics can ruin boolean short-circuiting or require four-valued boolean logic in restrictions:

SELECT x, y
FROM t
WHERE ((x / y) >= 2) AND ((x % 2) = 0)

If the table is large, this short-circuiting problem can get really crazy. Imagine the table had a million rows, one of which had a 0 divisor. What would the standard say is the semantics of:

SELECT CASE 
       WHEN EXISTS 
            (
                SELECT x, y, x / y AS quot
                FROM t
            )
       THEN 1
       ELSE 0
       END AS what_is_my_value

It seems like this value should probably be an error since it depends on the emptiness or non-emptiness of a result which is an error, but adopting those semantics would seem to prohibit the optimizer for short-circuiting the table scan here. Does this existence query require proving the existence of one non-bottoming row, or also the non-existence of a bottoming row?

I'd appreciate guidance here, because I can't seem to find the relevant part(s) of the specification.

+1  A: 

All implementations of SQL that I've worked with treat a division by 0 as an immediate NaN or #INF. The division is supposed to be handled by the front end, not by the implementation itself. The query should not bottom out, but the result set needs to return NaN in this case. Therefore, it's returned at the same time as the result set, and no special warning or message is brought up to the user.

At any rate, to properly deal with this, use the following query:

select
   x, y, 
   case y 
       when 0 then null 
       else x / y 
   end as quot
from
   t

To answer your last question, this statement:

SELECT x, y, x / y AS quot
FROM t

Would return this:

x    y   quot
7    2    3.5
3    0    NaN
4    1      4
26   5    5.2
31   0    NaN
9    3      3

So, your exists would find all the rows in t, regardless of what their quotient was.

Additionally, I was reading over your question again and realized I hadn't discussed where clauses (for shame!). The where clause, or predicate, should always be applied before the columns are calculated.

Think about this query:

select x, y, x/y as quot from t where x%2 = 0

If we had a record (3,0), it applies the where condition, and checks if 3 % 2 = 0. It does not, so it doesn't include that record in the column calculations, and leaves it right where it is.

Eric