views:

87

answers:

5

Subquestioning my question "Why to use “not null primary key” in TSQL?" [1]

As I understood from other discussions, some RDBMS (for example, MySQL, SQLLite, which else?) permit "unique" NULL in primary key (PK).
I read-read and could not grasp - why and what's for?


Update:
I believe it is beneficial for communication with other colleagues and database professionals to know the differences in basic fundamental concepts, approaches and their implementations in different DBMS.
Please add also RDBMS to "NULL PK" and "NOT NULL Primary Key" lists.

Update2:
MySQL is rehabilitated and returned to the "NOT NULL PK" list [2].
SQLLite is added (thanks to Paul Hadfield comment [1a]) to "NULL PK" list [3]:

"For the purposes of determining the uniqueness of primary key values, NULL values are considered distinct from all other values, including other NULLs. If an INSERT or UPDATE statement attempts to modify the table content so that two or more rows feature identical primary key values, it is a constraint violation. According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem. So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns. Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly." [3]

Update:
I am surprised to see that answers that do not even address my question (stating well-known and easily available/searchable on internet facts) are more upvoted.

[1]
"Why to use “not null primary key” in TSQL?"
http://stackoverflow.com/questions/3905703/why-to-use-not-null-primary-key-in-tsql
[1a]
Comment to my question [1] by Paul Hadfield giving the reference to [3]

[2]
Answer of Hammerite to this post.
http://stackoverflow.com/questions/3906811/null-permitted-in-primary-key-why-and-in-which-dbms/3907195#3907195

[3] SQL As Understood By SQLite. CREATE TABLE
http://www.sqlite.org/lang_createtable.html

+1  A: 

Well, it could allow you to implement the Null Object Pattern natively within the database. So if you were using something similar in code, which interacted very closely with the DB, you could just look up the object corresponding to the key without having to special-case a null check.

Now whether this is worthwhile functionality I'm not sure, but it's really a question of whether the pros of disallowing null pkeys in absolutely all cases outweigh the cons of obstructing someone who (for better or worse) actually wants to use null keys. This would only be worth it if you could demonstrate some non-trivial improvement (such as faster key lookup) from being able to guarantee that keys are non-null. Some DB engines would show this, others might not. And if there aren't any real pros from forcing this, why artificially restrict your clients?

Andrzej Doyle
+3  A: 

As far as relational database theory is concerned:

  • The primary key of a table is used to uniquely identify each and every row in the table
  • A NULL value in a column indicates that you don't konw what the value is
  • Therefore, you should never use the value of "I don't know" to uniquely identify a row in a table.

Depengin upon the data you are modelling, a "made up" value can be used instead of NULL. I've used 0, "N/A", 'Jan 1, 1980', and similar values to represent dummy "known to be missing" data.

Most, if not all, DB engines do allow for a UNIQUE constraint or index, which does allow for NULL column values, though (ideally) only one row may be assigned the value null (otherwise it wouldn't be a unique value). This can be used to support the irritatingly pragmatic (but occasionally necessary) situations that don't fit neatly into relational theory.

Philip Kelley
Plz exonerate me from [you should never use the value of "I don't know"]! I use SQL Server! I asked about others and other RDBMS
vgv8
This concept is not platform-specific, it is part of the specifications of a relational database system. SQL, Oracle, MySQL, Postgres, etc. are all just implementations of these specifications. The question of how correct and/or accurate they are has spawned any number of near-religious flame-wars across the internet.
Philip Kelley
Using a bogus value that is well outside the range of valid values can screw with cardinality calculations and are a bad idea in general. NULL means NULL, JAN 1, 1990 does not mean NULL.
Stephanie Page
@Stephanie, for dates, I agree. For things like status codes, having somthing like "0 = Status not yet assigned" is better than just leaving it null. (Cardinality would be unaffected, as you'd still have N rows whether it was null or 0.)
Philip Kelley
@Philip, I didn't say cardinality is affected, I said cardinality calculations. Some RDBMS's store min and max column values and assume that there's an even distributions of values in between (in the absence of a histogram). If the rest of your keys are 1,2,3,4,5... you'll be fine. If they are 1000,1001,1002... the optimizer will assume that you have 1/1000 of the rows for a predicate of ID = n. In your example, that looks like a FK, in which case it won't matter since you'll be filtering on the Code = 'Status not yet assigned' as opposed to where Code_ID = 0. You'll only be joining on that 0
Stephanie Page
You can read more here: http://richardfoote.wordpress.com/2007/12/13/outlier-values-an-enemy-of-the-index/
Stephanie Page
+2  A: 

I don't know whether older versions of MySQL differ on this, but as of modern versions a primary key must be on columns that are not null. See the manual page on CREATE TABLE: "A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently)."

Hammerite
Thanks for putting me on the right track. Well, it seems I made stupid question because I read too much StackOverflow. It is because I believed comments in http://stackoverflow.com/questions/3876785/sql-server-cant-insert-null-into-primary-key-field/3876808#3876808 and others. Well, I checked that PK in MySQL does not permit NULL.
vgv8
+3  A: 

Suppose you have a primary key containing a nullable column Kn.

If you want to have a second row rejected on the ground that in that second row, Kn is null and the table already contains a row with Kn null, then you are actually requiring that the system would treat the comparison "row1.Kn = row2.Kn" as giving TRUE (because you somehow want the system to detect that the key values in those rows are indeed equal). However, this comparison boils down to the comparison "null = null", and the standard already explicitly specifies that null doesn't compare equal to anything, including itself.

To allow for what you want, would thus amount to SQL deviating from its own principles regarding the treatment of null. There are innumerable inconsistencies in SQL, but this particular one never got past the committee.

Erwin Smout
excellent answer.
Stephanie Page
"To allow for what you want" - you answered something I had not asked. I asked why some DBMS has and not why they should not
vgv8
A: 

I found that the [3], given by Paul Hadfield in comment [1a], is the most relevant and helpful answer but I cannot mark answer to comment and in another thread so I am putting this as answer to my question leaving credits after Paul Hadfield.

vgv8