However, I have often read about
performance problems when fields are
nullable and been advised to use an
empty string in cases where NULL is
actually semantically correct.
I'm going to be nit-picky about word choice for a moment:
- Even if it were a significant performance factor, that doesn't make it semantically correct to use a value instead of NULL. In SQL, NULL has a semantic role, to denote a missing or inapplicable value. The performance characteristics of NULL in a given RDBMS implementation are independent of this. The performance may vary from brand to brand or from version to version, but the purpose of NULL in the language is consistent.
In any case, I have not heard of any evidence that NULL performs poorly. I'd be interested in any references to performance measurements that show nullable columns perform worse than non-nullable columns.
I'm not saying I'm not wrong or that it can't be true in some cases -- just that it's not meaningful to make idle suppositions. Science is not made up of conjecture; one has to show evidence with repeatable measurements.
Metrics also tell you by how much the performance differs, so you can make a judgment about whether it's something to worth worrying about. That is, the impact could be measurable and nonzero, but still insignificant compared to greater performance factors, such as properly indexing tables or sizing your database cache.
In MySQL, searches for NULL can benefit from an index:
mysql> CREATE TABLE foo (
i INT NOT NULL,
j INT DEFAULT NULL,
PRIMARY KEY (i),
UNIQUE KEY j_index (j)
);
mysql> INSERT INTO foo (i, j) VALUES
(1, 1), (2, 2), (3, NULL), (4, NULL), (5, 5);
mysql> EXPLAIN SELECT * FROM foo WHERE i = 3;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | foo | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
mysql> EXPLAIN SELECT * FROM foo WHERE j IS NULL;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | foo | ref | j_index | j_index | 5 | const | 2 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
Note that's still not a measurement of performance. I've only shown that you can use an index while searching for NULL. I'm going to assert (admittedly without having measured, but hey this is just StackOverflow) that the benefit of an index overshadows any possible penalty when searching for NULL versus a blank string.
It's not a correct design decision to choose zero or blank or any other value to substitute for NULL. You may need to use those values as significant in the column. That's why NULL exists, as a value that is by definition outside the domain of values of any data type, so you can use the full range of values of integers or strings or whatever and still have something to signify "none of the above values."