ansaurus

Question

How to deal with "partial" dates (2010-00-00) from MySQL in Django?

Answer 1

+1 A:

You could store the partial date as an integer (preferably in a field named for the portion of the date you are storing, such as year, month or day) and do validation and conversion to a date object in the model.

EDIT

If you need real date functionality, you probably need real, not partial, dates. For instance, does "get everything after 2010-0-0" return dates inclusive of 2010 or only dates in 2011 and beyond? The same goes for your other example of May 2010. The ways in which different languages/clients deal with partial dates (if they support them at all) are likely to be highly idiosyncratic, and they are unlikely to match MySQL's implementation.

On the other hand, if you store a year integer such as 2010, it is easy to ask the database for "all records with year > 2010" and understand exactly what the result should be, from any client, on any platform. You can even combine this approach for more complicated dates/queries, such as "all records with year > 2010 AND month > 5".

SECOND EDIT

Your only other (and perhaps best) option is to store truly valid dates and come up with a convention in your application for what they mean. A DATETIME field named like date_month could have a value of 2010-05-01, but you would treat that as representing all dates in May, 2010. You would need to accommodate this when programming. If you had date_month in Python as a datetime object, you would need to call a function like date_month.end_of_month() to query dates following that month. (That is pseudocode, but could be easily implemented with something like the calendar module.)

Alison R. 2010-06-04 02:54:56

I already thought of this solution but I think it will not work in my case. See my EDIT.

Etienne 2010-06-04 03:29:59

Answer 2

A:

Can you store the date together with a flag that tells how much of the date is valid?

Something like this:

YEAR_VALID = 0x04
MONTH_VALID = 0x02
DAY_VALID = 0x01

Y_VALID = YEAR_VALID
YM_VALID = YEAR_VALID | MONTH_VALID
YMD_VALID = YEAR_VALID | MONTH_VALID | DAY_VALID

Then, if you have a date like 2010-00-00, convert that to 2010-01-01 and set the flag to Y_VALID. If you have a date like 2010-06-00, convert that to 2010-06-01 and set the flag to YM_VALID.

So, then, PartialDateField would be a class that bundles together a date and the date-valid flag described above.

P.S. You don't actually need to use the flags the way I showed it; that's the old C programmer in me coming to the surface. You could use Y_VALID, YM_VALID, YMD_VALID = range(3) and it would work about as well. The key is to have some kind of flag that tells you how much of the date to trust.

steveha 2010-06-04 05:22:17

This answer doesn't address the fact that Python doesn't consider something like 2010-00-00 a valid date (even if MySQL does). How do you suggest he store and retrieve that?

Alison R. 2010-06-04 14:25:37

This answer suggests that he convert such a date to 2010-01-01. Look, it's in there, I didn't just edit that. MySQL has a feature where you can store weird partial dates, but Python doesn't understand them, so I am specifically recommending not using the weird partial dates, but rather using dates and a flag that says how much of the date you can actually trust.You don't actually need to use the flags the way I showed it; that's the old C programmer in me coming to the surface. You could use Y_VALID, YM_VALID, YMD_VALID = range(3) and it would work about as well.

steveha 2010-06-06 06:12:08

Fair enough. I can't revoke my downvote unless the answer is edited, unfortunately. (P.S. this indeed seems like a rather C-ish solution ;)

Alison R. 2010-06-10 21:11:32

Okay, I'll edit the answer!

steveha 2010-06-15 03:28:13

Answer 3

A:

It sounds like you want to store a date interval. In Python this would (to my still-somewhat-noob understanding) most readily be implemented by storing two datetime.datetime objects, one specifying the start of the date range and the other specifying the end. In a manner similar to that used to specify list slices, the endpoint would not itself be included in the date range.

For example, this code would implement a date range as a named tuple:

>>> from datetime import datetime
>>> from collections import namedtuple
>>> DateRange = namedtuple('DateRange', 'start end')
>>> the_year_2010 = DateRange(datetime(2010, 1, 1), datetime(2011, 1, 1))
>>> the_year_2010.start <= datetime(2010, 4, 20) < the_year_2010.end
True
>>> the_year_2010.start <= datetime(2009, 12, 31) < the_year_2010.end
False
>>> the_year_2010.start <= datetime(2011, 1, 1) < the_year_2010.end
False

Or even add some magic:

>>> DateRange.__contains__ = lambda self, x: self.start <= x < self.end
>>> datetime(2010, 4, 20) in the_year_2010
True
>>> datetime(2011, 4, 20) in the_year_2010
False

This is such a useful concept that I'm pretty sure that somebody has already made an implementation available. For example, a quick glance suggests that the relativedate class from the dateutil package will do this, and more expressively, by allowing a 'years' keyword argument to be passed to the constructor.

However, mapping such an object into database fields is somewhat more complicated, so you might be better off implementing it simply by just pulling both fields separately and then combining them. I guess this depends on the DB framework; I'm not very familiar with that aspect of Python yet.

In any case, I think the key is to think of a "partial date" as a range rather than as a simple value.

edit

It's tempting, but I think inappropriate, to add more magic methods that will handle uses of the > and < operators. There's a bit of ambiguity there: does a date that's "greater than" a given range occur after the range's end, or after its beginning? It initially seems appropriate to use <= to indicate that the date on the right-hand side of the equation is after the start of the range, and < to indicate that it's after the end.

However, this implies equality between the range and a date within the range, which is incorrect, since it implies that the month of May, 2010 is equal to the year 2010, because May the 4th, 2010 equates to the both of them. IE you would end up with falsisms like 2010-04-20 == 2010 == 2010-05-04 being true.

So probably it would be better to implement a method like isafterstart to explicitly check if a date is after the beginning of the range. But again, somebody's probably already done it, so it's probably worth a look on pypi to see what's considered production-ready. This is indicated by the presence of "Development Status :: 5 - Production/Stable" in the "Categories" section of a given module's pypi page. Note that not all modules have been given a development status.

Or you could just keep it simple, and using the basic namedtuple implementation, explicitly check

>>> datetime(2012, 12, 21) >= the_year_2010.start
True

intuited 2010-06-04 23:00:34

I definitely not storing time intervals. This PartialDate will be use to store mostly birthday dates for peoples, dead and alive. So, frequently, especially for dead peoples, my client do not have the complete date. But it help me in my development to "think of a "partial date" as a range".

Etienne 2010-06-12 03:55:41

Answer 4

A:

First, thanks for all your answers. None of them, as is, was a good solution for my problem, but, for your defense, I should add that I didn't give all the requirements. But each one help me think about my problem and some of your ideas are part of my final solution.

So my final solution, on the DB side, is to use a varchar field (limited to 10 chars) and storing the date in it, as a string, in the ISO format (YYYY-MM-DD) with 00 for month and day when there's no month and/or day (like a date field in MySQL). This way, this field can work with any databases, the data can be read, understand and edited directly and easily by a human using a simple client (like mysql client, phpmyadmin, etc.). That was a requirement. It can also be exported to Excel/CSV without any conversion, etc. The disadvantage is that the format is not enforce (except in Django). Someone could write 'not a date' or do a mistake in the format and the DB will accept it (if you have an idea about this problem...).

This way it's also possible to do all of the special queries of a date field relatively easily. For queries with WHERE: <, >, <=, >= and = work directly. The IN and BETWEEN queries work directly also. For querying by day or month you just have to do it with EXTRACT (DAY|MONTH ...). Ordering work also directly. So I think it covers all the query needs and with mostly no complication.

On the Django side, I did 2 things. First, I have created a PartialDate object that look mostly like datetime.date but supporting date without month and/or day. Inside this object I use a datetime.datetime object to keep the date. I'm using the hours and minutes as flag that tell if the month and day are valid when they are set to 1. It's the same idea that steveha propose but with a different implementation (and only on the client side). Using a datetime.datetime object gives me a lot of nice features for working with dates (validation, comparaison, etc.).

Secondly, I have created a PartialDateField that mostly deal with the conversion between the PartialDate object and the database.

So far, it works pretty well (I have mostly finish my extensive unit tests).

Etienne 2010-06-12 03:50:38

This seems like a very good solution. I don't like the trick of using the hours and minutes as a flag, because I am worried that someday you might need to use them to actually store hours and minutes, and then you will have a problem. But if that day never comes, then I am worried about nothing!

steveha 2010-06-15 03:32:18

If you ever need to extend this to handle hours, minutes, seconds, etc. I suggest you use RFC 3339 standard, as it shares the same virtues as your solution (works with any database, human readable, etc.) http://www.ietf.org/rfc/rfc3339.txt

steveha 2010-06-15 03:36:16

Thanks for the info about the RFC! I agree that using the hours and minutes as flag is the only hackish part of my solution but its making things so simple for sorting, etc. If I need time in the futur, I will create a new PartialDateTime object (not a PartialDate) with an other implementation. But honestly I have hard time to see the use case for this?

Etienne 2010-06-15 15:35:02

ansaurus

tags:

views:

answers:

How to deal with "partial" dates (2010-00-00) from MySQL in Django?

EDIT

edit

related questions