views:

133

answers:

2

I have a table with a start date range, an end date range, and a few other additional columns. On input of a new record, I want to automatically adjust any overlapping date ranges (shrinking them, splitting them, or deleting them to allow for the new input -- see algorithm below). I also want to ensure that no overlapping records can accidentally be inserted into this table.

I'm using Oracle and Java for my application code. How should I enforce the prevention of overlapping date ranges and also allow for automatically adjusting overlapping ranges? Should I create an AFTER INSERT trigger, with a dbms_lock to serialize access, to prevent the overlapping data. Then in Java, apply the logic to auto adjust everything? Or should that part be in PL/SQL in stored procedure call? This is something that we need for a couple other tables so it'd be nice to abstract.

If anyone has something like this already written, please share :)

I did find this reference: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:474221407101

Here's an example of how each of the 4 overlapping cases should be handled for adjustment on insert:

= Example 1 =
In DB (Start, End, Value):
(0, 10, 'X')
**(30, 100, 'Z')
(200, 500, 'Y')

Input
(20, 50, 'A')
Gives
(0, 10, 'X')
**(20, 50, 'A')
**(51, 100, 'Z')
(200, 500, 'Y')


= Example 2 =
In DB (Start, End, Value):
(0, 10, 'X')
**(30, 100, 'Z')
(200, 500, 'Y')

Input 
(40, 80, 'A')
Gives
(0, 10, 'X')
**(30, 39, 'Z')
**(40, 80, 'A')
**(81, 100, 'Z')
(200, 500, 'Y')


= Example 3 =
In DB (Start, End, Value):
(0, 10, 'X')
**(30, 100, 'Z')
(200, 500, 'Y')

Input
(50, 120, 'A')
Gives
(0, 10, 'X')
**(30, 49, 'Z')
**(50, 120, 'A')
(200, 500, 'Y')


= Example 4 =
In DB (Start, End, Value):
(0, 10, 'X')
**(30, 100, 'Z')
(200, 500, 'Y')

Input
(20, 120, 'A')
Gives
(0, 10, 'X')
**(20, 120, 'A')
(200, 500, 'Y')

The algorithm is as follows:

given range = g; input range  = i; output range set = o

if i.start <= g.start
  if i.end >= g.end
    o_1 = i
  else
    o_1 = i
    o_2 = (i.end + 1, g.end)
else
  if i.end >= g.end
    o_1 = (g.start, i.start - 1)
    o_2 = i
  else
    o_1 = (g.start, i.start - 1)
    o_2 = i
    o_3 = (i.end + 1, g.end)
+2  A: 

I've generally seen data models like that having the starting point of the range being the only one being tracked, where the ending point is then implicit. So it'd be

CREATE TABLE MY_TABLE
(START_AT    NUMBER,
 VALUE       NUMBER,
 CONSTRAINT MY_TABLE_PK (START_AT)
);

If you need to present values in the existing format, you could use analytics and a materialized view, using LEAD(START_AT) OVER (ORDER BY START_AT) (I think it's right, but untested) to get the interpreted end value.

Adam Musch
@Adam: +1, although I would define the primary key as `(start_at)` so that you would have no ambiguity with two periods starting at the same time. Also the trade-off of such a model is that it is harder to query than the model with `(start, end)`.
Vincent Malgrat
My understanding (and limited) read was that each key had its own ranges. That appears to be correct, and similarly, the OP appears to have undefined ranges. If that's the case, the OP would also need to insert (START_AT, VALUE) of (:previous_logical_end+1, NULL) fencepost records. The materialized view suggestion was to help with the query question, although I don't think querying the base table itself is that bad, especially not with analytics. Editing answer.
Adam Musch
Adam, thanks for your reply. I did not think of such a solution. Prior to your suggestion, the data model looks something like (product_id, start_date, end_date, discount_value). I need to get the discount_value for a product on a date. The user inputs the start and end date of when to run the discount for a product, along with the discount amount.My specific example is a little more complicated, so I hope the product/discount example is a fair representation of my problem. Yes, there are undefined ranges.
Bradford
So, the only advantage of this method is that you wouldn't need the trigger, right? You would still need to do the work of cleaning up overlapping rates, but it could never overlap in the table.I mean that if (1, val1), (2, val2), (3, val3), (4, val4) exists and the user wants to insert 1 to 4 val100, then you would have to delete (1, val1), (2, val2), (3, val3), (4, val4) first before inserting (1, val100) and (5, null) assuming that (5, someval) did not already exist. This seems more complicated. Or am I missing something.
Bradford
The advantage is that you're not tracking the fencing (the range); you're only tracking the fenceposts (places where the ranges change). (Also, the model you're discussing now requires tracking the fenceposts individually for each product, which was how I originally interpreted the question.) The SQL for finding the discount for product X at date Y would then be:`SELECT DISCOUNT FROM (SELECT DISCOUNT, ROW_NUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) AS DISCOUNT_OFFSET FROM PRODUCT_DISCOUNT WHERE START_DATE >= [date y] AND PRODUCT = [product X]) WHERE DISCOUNT_OFFSET = 1`
Adam Musch
When the discount ends for a product, you put in a fence post with a zero discount.
Adam Musch
Adam, on insert, wouldn't you need something like:(1.) remove all start >= i.start and start <= i.end(2.) insert i.start w/ value(3.) z = record at i.end + 1(4.) if !z, then z.start = i.end + 1; z.value = 0(5.) remove all consecutive records where z.value = 0.p.s. I'm checking this as an answer.
Bradford
Step #5 in my last comment was just to reduce the number of records.
Bradford
@Bradford -- I don't think you ever need to remove any records. If running a discount of 20% for Product 123 beginning March 1 good through March 31, `insert values (123. '2010-03-01', .20)` to start it and `insert values (123, '2010-04-01', 0)` to end it. If you need to change the discount on a date, update the record; if you need to add something in the middle, like upping the discount from March 16th forward, it's `insert values (123, '2010-03-16', .30)`.
Adam Musch
The input is a range, e.g. from start date to end date I want to run the discount for product 123 at 5.50. Just using numbers and characters for this example, let's say you had [(30, A), (100, 0), (120, B), (141, C), (160, 0)]. Now you want to insert "from 60 to 130 I want to run with D". You would have to remove (100, 0) and (120, B) or querying for 125 would give you 'B' and not 'D'; querying for 105 would give you 0 and not 'D'.
Bradford
A: 

AskTom's article gives a good example of how it can be done, but note that this example locks the whole table which will severely impact concurrency of your application.

If concurrency is a matter for you, you should just add a sequence column (with ORDER option if you use RAC) and write a query like this:

SELECT  *
FROM    (
        SELECT  *, rownum AS rn
        FROM    mytable
        WHERE   start_date <= :date
                AND end_date >= :date
        ORDER BY
                seq DESC
        )
WHERE   rn = 1

to find out effective range (and other data) for a given date.

This will return the last inserted range containing the given date.

You can make this query a little bit more efficient by running a maintenance procedure that would get rid of the overlapping ranges on a timely basis (like described in the post) and rewriting the query like this:

SELECT  *
FROM    (
        SELECT  *, rownum AS rn2
        FROM    (
                SELECT  *
                FROM    (
                        SELECT  *, rownum AS rn
                        FROM    mytable
                        WHERE   seq <= :lseq
                                AND start_date <= :date
                                AND end_date >= :date
                        ORDER BY
                                start_date DESC
                        )
                WHERE   rn = 1
                UNION ALL
                SELECT  *
                FROM    (
                        SELECT  *, rownum AS rn
                        FROM    mytable
                        WHERE   seq > :lseq
                                AND start_date <= :date
                                AND end_date >= :date
                        ORDER BY
                                seq DESC
                        )
                WHERE   rn = 1
                )
        ORDER BY
                seq DESC
        )
WHERE   rn2 = 1

Create the indexes on start_date and seq for this to work fast.

The latter query will select the first matching range from the processed ranges (which are known to be non-overlapping), the first matching range from the unprocessed ranges (which are few) and of the two records, will select the one with the highest seq.

Quassnoi
I like not having to worry about locking, but I am having a hard time seeing how the last query will clean up the data since you have specify the :date. Also, the user needs to see the date ranges. It just seems like it would be hard for the user to determine which values are associated for each date range -- though, I didn't specify that part in my question.
Bradford