ansaurus

Question

How to effectively do database as-of queries?

Answer 1

+1 A:

As-of queries are easier when each row has a start and an end time. Storing the end time in the table would be most efficient, but if this is hard, you can query it like:

select 
    ThisCar.CarId
,   StartTime = ThisCar.VersionTime
,   EndTime = NextCar.VersionTime
from Cars ThisCar
left join Cars NextCar
    on NextCar.CarId = ThisCar.CarId
    and ThisCar.VersionTime < NextCar.VersionTime
left join Cars BetweenCar
    on BetweenCar.CarId = BetweenCar.CarId
    and ThisCar.VersionTime < BetweenCar.VersionTime
    and BetweenCar.VersionTime < NextCar.VersionTime
where BetweenCar.CarId is null

You can store this in a view. Say the view is called vwCars, you can select a car for a particular date like:

select * 
from vwCars
where StartTime <= '2009-06-12 09:15' 
and ('2009-06-12 09:15' < EndTime or EndTime is null)

You could store this in a table valued stored procedure, but that might have a steep performance penalty.

Andomar 2009-06-12 12:08:11

Your query is more efficient (less table scans), but doesn't perform an as-of query. Your query is only getting the latest version, rather than the version as-of 09:50. We might be able to take some ideas from your query though, so thanks.

ng5000 2009-06-12 12:34:33

We won't be able to use views as we'll need to pass the time component of the query into the query. SPs may be an option, but with having to join to other tables we might need to look at table functions

ng5000 2009-06-12 12:39:15

Edited with new approach for as-of dates.

Andomar 2009-06-12 12:54:42

You're query isn't pulling back the results I wanted as per my question - thanks anyway

ng5000 2009-06-12 13:04:48

Answer 2

+1 A:

Depending on your application you might want to push the versioning to secondary auditing tables, that would have both a start and a nullable end date. I found in a high trafic OLTP that using the versioning approach can become fairly expensive and if most of your reads pull the latest version then this might be beneficul.

By using a start and end date you can query the ancillary tables looking for a date that is between start and stop or greater then start.

JoshBerke 2009-06-12 13:00:32

Answer 3

+1 A:

Storing the end time in the table for each situation makes the queries indeed easier to express, but creates the problem of maintaining integrity rules such as "no two distinct situations for the same car (wheel/...) may overlap" (still reasonably doable) and "there cannot be holes in the timeseries of distinct situations of any single (car/wheel/...)" (more troublish).

Not storing the end time in the table for each situation forces you to write self-joins each time you need to invoke an Allen operator (overlaps, merges, contains, ...) on the time intervals implied by the only time column you have.

SQL is just a nightmare if you need to do this kind of temporal stuff.

And incidentally, even just accurately formulating these queries in natural language is a nightmare. To illustrate : you said that you needed "as-of" queries, but your examples excluded the situations which were "as-of" 10:05 (wheelVer 3) and 10:00 (color black). This despite the fact that those situations are definitely also "as-of" 09:50.

You may be interested in a read of "Temporal Data and the Relational Model". Keep in mind that the treatment in this book is entirely abstract, since, as the book itself says, "this book is not about technology available anywhere today".

The other standard textbook on the subject (I'm told), is one by Snodgrass, but I don't know the title. I'm told the authors of these two books take completely opposite stances as to what the solution ought to be.

2009-06-12 13:44:21

Answer 4

+3 A:

This kind of table is known as a valid-time state table in the literature. It is universally accepted that each row should model a period by having a start date and an end date. Basically, the unit of work in SQL is the row and a row should completely define the entity; by having just one date per row, not only do your queries become more complex, your design is compromised by splitting sub atomic parts on to different rows.

As mentioned by Erwin Smout, one of the definitive books on the subject is:

Richard T. Snodgrass (1999). Developing Time-Oriented Database Applications in SQL

It's out of print but happily is available as a free download PDF (link above).

I have actually read it and have implemented many of the concepts. Much of the text is in ISO/ANSI Standard SQL-92 and although some have been implemented in proprietary SQL syntaxes, including SQL Server (also available as downloads) I found the conceptual information much more useful.

Joe Celko also has a book, 'Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL', largely derived from Snodgrass's work, though I have to say where the two diverge I find Snodgrass's approaches preferable.

I concur this stuff is hard to implement in the SQL products we currently have. We think long and hard before making data temporal; if we can get away with merely 'historical' then we will. Much of the temporal functionality in SQL-92 is missing from SQL Server e.g. INTERVAL, OVERLAPS, etc. Some things as fundamental as sequenced 'primary keys' to ensure periods do not overlap cannot be implemented using CHECK constraints in SQL Server, necessitating triggers and/or UDFs.

Snodgrass's book is based on his work for SQL3, a proposed extension to Standard SQL to provide much better support for temporal databases, though sadly this seems to have been effectively shelved years ago :(

onedaywhen 2009-06-12 14:33:22

Answer 5

+1 A:

This query will return duplicates if you have two rows with the same exact version time for a single car ID, but that's a matter of defining what you consider to be the "latest" one in that situation. I haven't had a chance to test this yet, but I think it will give you what you need. It's at least pretty close.

SELECT
     C.car_id,
     C.car_version,
     C.colour,
     C.version_time AS car_version_time,
     W.wheel_id,
     W.wheel_version,
     W.version_time AS wheel_version_time,
FROM
     Cars C
LEFT OUTER JOIN Cars C2 ON
     C2.car_id = C.car_id AND
     C2.version_time <= @as_of_time AND
     C2.version_time > C.version_time
LEFT OUTER JOIN Wheels W ON
     W.car_id = C.car_id AND
     W.version_time <= @as_of_time
LEFT OUTER JOIN Wheels W2 ON
     W2.car_id = C.car_id AND
     W2.wheel_id = W.wheel_id AND
     W2.version_time <= @as_of_time AND
     W2.version_time > W.version_time
WHERE
     C.version_time <= @as_of_time AND
     C2.car_id IS NULL AND
     W2.wheel_id IS NULL

Tom H. 2009-06-12 15:00:04

A few minor changes to unify naming (e.g. car_id to CarId) and your query works.

ng5000 2009-06-12 15:14:03

ansaurus

tags:

views:

answers:

How to effectively do database as-of queries?

related questions