views:

684

answers:

2

I have a situation which I have solved in two different ways, but was wondering what people thought about the options, and if they have any other alternatives...

The system is processing "intervals" of data.

  • All the data is allocated to an "interval"
  • The interval is represented by an "interval_start" DATETIME in the Fact table
  • A Dimenstion table holds the duration of the "interval" (can be different for different entities)
  • Also in the Dimension table is a "mode" flag


The following rules are needed in a SQL Query...

Mode 0
If a record entry is in the fact table, it can be processed.
(No restrictions.)

Mode 1
A record is only valid for processing if the "interval_start" is in the past. (Intervals of time that have already begun, but not necessarily finished.)

Mode 2
A record is only valid for processing if the whole "interval" is in the past. (Intervals of time that have finished.)


The first WHERE clause created for this was as follows...

WHERE
    getDate() >=
        CASE [table].mode
            WHEN 0 THEN 0
            WHEN 1 THEN [log].interval_start
            WHEN 2 THEN [log].interval_start + [table].interval_period
            ELSE        NULL
        END


There was a concern that this would obsfuscate indexes from optimising the clause. The alternative was to use multiple AND/OR conditions.

WHERE
   ([table].mode = 0 AND getDate() >= 0)
OR ([table].mode = 1 AND getDate() >= [log].interval_start)
OR ([table].mode = 2 AND getDate() >= [log].interval_start + [table].interval_period)


Obviously, which will perform best will depend on the data and indexes, etc. But does anyone have any opion or alternative for what I'm currently describing as "conditional conditions"? :)

Cheers, Mat.

+2  A: 

How about:

WHERE
     interval_start <= CASE mode
                         WHEN 0 THEN '9999-12-31'
                         WHEN 1 THEN GETDATE()
                         WHEN 2 THEN GETDATE() - interval_period
                       END
Tom H.
We thought about that refactor, on the assumption that there would be an improvement by resticting the CASE statement to use ONLY fields from the Dimension table. The benefit was low and we found that certain 'lesser employees' [grin] couldn't get their heads around what getDate()-interval_period acheived. We may still go that way, but with lots of comments :)
Dems
@Tom: This is months old, but I have to point out that your code fails in most of the world outside the USA if [mode] = 0 for some row. Unbelievably, in locales that use day-month-year as the default date format, 'YYYY-AA-BB' is interpreted as month B, day A, if it's implicitly cast to DATETIME or SMALLDATETIME as here because DATETIME is the type of the CASE expression. Adding more craziness, this wouldn't happen if you used SYSDATETIME (and replaced the subtraction with DATEDIFF). That would type the CASE expression as DATETIME2, which interprets YYYY-MM-DD in all locales.
Steve Kass
+2  A: 

Another option would be to run three separate queries and UNION them together. Depending on your index and table structure, that might lead to more efficient SQL than OR-ing together the clauses in the WHERE clause or using CASE statements, although it would purely be gaming the optimizer.

I'd test all three approaches and pick the one that works best for you.

mwigdahl
Considering I'm an advocate of UNIONing several simpler queries to avoid a much slower single query, I'm ashamed I didn't think of it in this case. Thank you for the reminder, I am currently assaulting my own head with a large wet fish as a punishment...
Dems
:) I'd be interested to know whether it works for you (the UNIONs, not the slapping with a wet fish...) It would be nice to know whether that should be the default approach for most queries of this type or whether it's simply one more option in the bag of tricks.
mwigdahl
the dataset is very small at present, and I expect the UNION to be slower. But I also expect the dataset to grow by several orders of magnitude and then I think the UNION approach may be better. Assuming I remember over the next few weeks, I'll add another comment with my findings :) P.S. The fish was a Tuna, it hurt a lot due to it's sheer size, but tasted great on wholemeal bread with a little ground sea salt...
Dems
The CASE options were extremely poor (mine worst, Tom's next) and were worst in all cases. The AND/OR logic was best on smaller sets, the UNION best on larger sets. (AND/OR introducing a small overhead per row, UNION introducing a relatively significant overhead per query being UNIONed. Where the tipping point lies would depend on the data, the logic and the size of the sets)
Dems