views:

356

answers:

10

So, I have a table where StartDate and EndDate's are used to determine the activeness of a record. I thought of using NULLs to relieve the maintainers from having to manufacture some crazy dates for some of the records. For example, if NULL had defined as positive infinite when used in EndDate, the data maintainers wouldn't need to come up with something like 1-1-2100 for long-lived records.

I can speculate some of the trade offs on my own: defining NULL as infinities means cleaner data and elimination of periodic maintenance work, but it also means longer queries and stored procedures. So I was wondering how you guys in the real world weigh in on this.

EDIT: the opinions are about half-half. If I had clarified that the StartDate and EndDate are used solely for the purpose of determining the activeness of a record in the where clause, and never appears in the select list, would that tilt the scale? One subject I need to read on is probably indexing. Thanks'all.

+2  A: 

I've used NULL for this purpose before with no problems, but I haven't developed any large-scale applications.

orthod0ks
+1  A: 

I'd definitely use null in this case. The stored procedures and queries aren't that much of a big deal.

If you have a value in there like '2100-01-01', then just looking at it, I would assume that it is a valid value. If I see NULL, I consider it to have special meaning (besides lack of value, although sometimes, that's all it is, and that's fine).

casperOne
+1  A: 

I would tend to use defined Min and Max dates instead of nulls. Primarily because, at least in C#, when you get the data out of the DB then you have to start dealing with Nullable types, and I find that annoying.

It also tends to make it easier on queries. If I query with an end date of Max, then I get all dates with a max end date, plus all dates with an end date less than max, which is usually what I want. If I have nulls, then I have to do a join to get dates that are less than max and null values. Hope that makes sense.

I guess, bottom line, I approach these problems from a usability standpoint for developers, not db users, so that's where my perspective lies.

Chris Holmes
+1  A: 

I would say your own speculations are quite correct. It's really one of those trade-off type situations.

My own person preference is to disallow NULL's in Date-type fields, and always use a "high value" for something like an end date. For me, this greatly simplifies all queries I'll have to do against a start/end date and prevents me from having lots and lots of NULL checking in multiple queries.

That said, it depends upon the application. If I must have NULL's in date fields, I'll use them, but if I can get away with substituting a real date, I will. Of course, the downside of using a real date is that it is just that, a real date, and ceases to have the standard database definition of a NULL, i.e. an absence of data.

CraigTP
A: 

I think using null is better than some arbitrary special value. If nothing else, seeing a null value makes you pause and think about what it might mean.

Tundey
A: 

If you do go with using a default "system" date for the End Date field (and even a Start Date field) just be sure to document the meaning of those dates somewhere in your code (and possibly database schema). It will make it easier for those that come after you to understand the meaning of an otherwise arbitrary date.

Tim Lentine
+4  A: 

If either StartDate or EndDate of your record will be NULL, then the following condition:

BETWEEN StartDate AND EndDate

will never match this record.

You'll need to do following:

BETWEEN IFNULL(StartDate, '01.01.1000') AND IFNULL(EndDate, '01.01.3000')

, which, of course, is not good for indices.

Of course, you may create a function based index, but you'll need to provide a constant for it anyway.

Summary: if you want performance, use constants, do not use NULLs

Quassnoi
A: 

if you use NULL dates in your DB, make sure you use DateTime? (nullable C#) in your code, that will make your life a lot easier :)

roman m
+4  A: 

NULL already has a meaning, adding an interpretation of NULL as a valid data value leads to ambiguity. When the query is run and data returned, what will be the meaning of a resulting NULL? How to differentiate between a fail state and valid Max state?

Define min and max constants and let Null be Null.

Great Comments:
@ Haoest - you are right a Null is not the same as a set of Nulls and I was not clear. SQL's behavior when processing Nulls will require more code for more checks. SQL's results when finding a Null may not match the programmers intuition NULL(SQL).
@ MBCook - great link, wish I had posted that myself - thanks

Update: Once a query or function has a Null for that date then you no longer know if the Null was assigned because of it means Max or because a Null was propagated up the query.

Paxic
whats the ambiguity? a null set is not the same as a set of nulls.
Haoest
NULL is a magic value. SQL is basically trivalent logic, things are TRUE, FALSE or NULL (basically, unknown). A NULL EndDate should mean "unknown end date". It's mentioned on the Wikipedia page: http://en.wikipedia.org/wiki/Trivalent_logic
MBCook
+1  A: 

I'd try to work with NULL values as long as possible.

My understanding of NULL in a column EndDate would be that there is not defined an end date as of now. That would be consistent with that record being valid from its start date to its end date.

I agree with Quassnoi that you can't directly specify

BETWEEN StartDate AND EndDate

but instead of his suggestion (which makes using indices/indexes difficult), this one works as well:

(somedate >= StartDate or StartDate is null)
AND
(somedate <= EndDate or EndDate is null)

AFAIK this would allow usage of an index, but check the execution plan for your specific circumstances.

The disadvantage of using "special" start/end dates instead is enforceability. If all your DB access is through a specific program language with more or less mandatory libraries, you may be able to make this work. If however, you have different access paths (direct SQL, different languages/libraries) this would be very hard to do.

There may be a third way: Using NULLs for DML statements and then have a trigger change that to a predefined min/max value. Then Selects might be easier. But using a trigger is opening another can of worms ...

My conclusion: This scenario is a decent use for a database NULL. In my experience, I have not yet run into serious performance problems. But I agreee that the extra NULL handling is a bit of nuisance.

IronGoofy
(somedate >= StartDate or StartDate is null) will not use index too. NULLs are not even kept in the index, RDBMS will need to perform full table scan anyway to make sure nothing is missed, and single full table scan for whole condition is better than an index search AND a full table scan for NULLS.
Quassnoi