views:

47

answers:

1

I'm reading a paper on SCOPE that discusses SQL like query semantics for big data applications. It does not follow how SQL deals with null values and discusses "null-extended" variables, which I have not encountered before. Consider the pseudo-query

SELECT * FROM DATA WHERE A != B

What does "the predicate A != B is satisfied only for rows that are null extended on B" mean?

+3  A: 

The term "null extended" is used generally to refer to the set algebra in a modern DBMS. That is, it "extends" regular relational algebra by introducing NULL values, or rather a single universal NULL value. Every predicate involving a NULL has a defined result that is logically consistent with the rest of the algebra.

I've also seen the term used to refer to outer joins. For example, this query:

SELECT A.id, A.name AS NameA, B.name AS NameB
FROM A
LEFT JOIN B
ON B.id = A.id

Might give you the following results:

  id  |   NameA   |   NameB
------+-----------+-----------
  10  |   John    |   Smith
  11  |   Jane    |   Doe
  12  |   Bob     |   NULL

What's happening here is that for id 12, A is being "null extended" with the columns from B because there are no values from B available. In general, when you perform a join on two relations, A and B, and you want to include tuples in A that have no matching tuples in B (outer join), then A must be null-extended with the attributes of B in order to form a complete result set.


This specific line that you put in quotations:

the predicate A != B is satisfied only for rows that are null extended on B

...doesn't really make sense when taken out of context. You have to look at the whole thing:

SQ = SELECT DISTINCT Sc FROM S WHERE Sa < 20;
M1 = SELECT Ra, Rb, Rc FROM R WHERE Rb < 100;
M2 = SELECT Ra, Rb, Rc, Sc
    FROM M1 LEFT OUTER JOIN SQ ON Rc == Sc;
Q = SELECT Ra, Rb FROM M2
WHERE Ra > 5 OR Rc != Sc;

And a little bit later:

M1 rows that do not join with any rows from SQ are null-extended on Sc. The fourth select (Q) computes the final result by applying the predicate referencing the subquery. Note that the predicate Rc != Sc is satisfied only for rows that are null-extended on Sc.

Now with some context, it's easier to understand what they're trying to say. Since the join condition in M2 is Rc == Sc, it follows that the condition Rc != Sc can only be true if Sc is NULL - otherwise, Sc would be equal to Rc because that's how it was joined. In other words, the condition Rc != Sc can only be true for the rows in M2 where M1 was null-extended with the columns from SQ because it did not match any rows in SQ.

Hopefully that clears up some of the confusion.

Aaronaught