views:

118

answers:

5

Hi, I have a database which needs to store year ranges (such as lifespan) which in some instances aren't very specific. For example, as well as the usual '1960', we have 'c. 1960' for "around 1960", and 1960s for in the decade 1960, as well as 'nd' for no date which is distinctly different from NULL which would be until present.

What would be the best way to store these?

I was thinking of possibly storing the year as a string and appending 's' for decades and 'c' for circa, with 'nd' for no date... but this feels just wrong. Also, when sorting chronologically I need to sort by inserting 1960s items before c. 1960 items, which in turn are before 1960 items.

How do I do this properly?

Some example date ranges:

1957
c. 1957
c. 1957 - 1965
c. 1957 - 1960s
c. 1957 - present
c. 1957 - nd

Thanks for your help!

+8  A: 

You can insert the earliest and the latest date possible for the range.

Like, exact date will be

Jan 1, 1960   Jan 1, 1960

, c. 1960 will be

Jan 1, 1960   Dec 31, 1960

, 1960s will be

Jan 1, 1960   Dec 31, 1969

etc.

Then you can order by the first date and the datediff:

SELECT  *
FROM    mytable
ORDER BY
        start_date, DATEDIFF(end_date, start_date) DESC

Alternatively, you can store the expected value and the deviation instead of the start and end dates.

Quassnoi
Awesome. Thank you!
Tricky
+1  A: 

This sounds like a 'fuzzy logic' type problem. I would tackle it by using multiple columns:

  • Earliest Possible date
  • Earliest Certain date
  • Latest Certain Date
  • Latest Possible Date

Your examples might be represented as

  • 1957,1957,1957,1957
  • 1955,1957,1957,1959
  • 1955,1959,1965,1965
  • 1955,1959,1960,1969
  • 1955,1959,NULL,NULL
  • 1955,1959,9999,9999

I'm assuming 'c' means +/-2 years, a decade runs from '0 to '9, and 9999 is a flag value for 'nd'

Any use?

Max
A: 

I think Quassnoi answer is good, alternativly you could generate a time dimension then reference it in a simular way (though your dimention may not need as many rows)?

Mr Shoubs
A: 

You need to predetermine what your circa structure is OR allow for varience. Circa is often associated with classification of artifacts (books, photo's, writings) and other entities (unknown exact birth/death dates) where wide varience might be found. Keep in mind that reclassification of artifacts often occurs as more information is gained. Thus what Max has defined works, but I would also add a definative classification date and perhaps a column with a description and/or name of the classifier/entity defining the classification. You can determine if you need historical records of reclassification and how to handle that.

One other item of note regarding things of this nature - what calendar was in place defining the actual date for a day such as the The date Oct 4 (Julian) was followed by the date Oct 15 (Gregorian) which was a change made in 1582 and the similar change in 1751 that change Britian and its then colonies to the Gregorian calendar. Thus you need to determine the calendar in use which can be translated to present day calendars.

The point of this, is you MIGHT also need an identifier of the calendar in use assiciated with the determination and cite that as well.

Additionally, some events are associated with months denoted as "first month, second month etc." which, might be confusing as for instance the Quakers change where "first month" in 1751 was March but in 1752 it was January - especially noted when genealogy is the subject matter (as you imply) and births/deaths are the events.

For fun, toss in Caeser and Calends (first day of March) and his reform of the leap year/day where February 24 was the leap day as "second 6th day preceding the Calends" so there were two February 24ths (two 6th days) in leap years.

I point this out because the number of days in a leap year may in some instances vary your periods.

So, use a structure such as Max and/or Quassnoi propose, but possibly a bit more for the deliniation.

Citation for calendars and understanding: What is a calendar

Mark Schultheiss
Max
A: 

"How do I store (and sort) vague date ranges?"

(A) Let's analyze with an above-average level of scrutiny what it means to be a date range. This means that there is a precise start date and a precise end date of the range.

(B) Let's analyze with an above-average level of scrutiny what it might mean to be a "vague" date range. To me, it seems like this is some kind of date range in which either the start date, or the end date, or both, are not precisely known, but in which one (or both) of those are THEMSELVES a range.

(C) It is a known fact that the usual mathematical notion of "ordering" (aka "total order") does not usefully apply to the concept of "ranges" (how would you decide that 1-3 is either greater or smaller than 0-6 ?). It IS theoretically possible to define such an ordering, based on e.g. "order on start dates first, and within equal start dates, on end dates", but such an ordering is never useful.

(D) And if the "start date" and/or "end date" of a range are themselves "vague" (i.e. those components are ranges themselves), then it is obvious why it is insane to try and meaningfully order such ranges.

So the answer to the "(and sort)" part is : "You don't, because you can't (in a meaningful way)".

Erwin Smout