views:

73

answers:

1

I have a table T1, it contains a NAME value (not unique), and a date range (D1 and D2 which are dates) When NAME are the same, we make a union of the date ranges (e.g. B).

But as a result (X), we need to make intersection of all the date ranges

Edit: Table T1

NAME | D1       | D2
A    | 20100101 | 20101211
B    | 20100120 | 20100415
B    | 20100510 | 20101230
C    | 20100313 | 20100610

Result :

X    | 20100313 | 20100415
X    | 20100510 | 20100610

Visually, this will give the following :

NAME        : date range
A           : [-----------------------]-----
B           : --[----]----------------------
B           : ----------[---------------]---
C           : -----[--------]---------------

Result :

X           : -----[-]----------------------
X           : ----------[---]---------------

Any idea how to get that using SQL / PL SQL ?

+4  A: 

Hi,

here is a quick solution (may not be the most efficient):

SQL> CREATE TABLE myData AS
  2  SELECT 'A' name, date'2010-01-01' d1, date'2010-12-11' d2 FROM DUAL
  3  UNION ALL SELECT 'B', date'2010-01-20', date'2010-04-15' FROM DUAL
  4  UNION ALL SELECT 'B', date'2010-05-10', date'2010-12-30' FROM DUAL
  5  UNION ALL SELECT 'C', date'2010-03-13', date'2010-06-10' FROM DUAL;

Table created

SQL> WITH segments AS (
  2  SELECT dat seg_low, lead(dat) over(ORDER BY dat) seg_high
  3    FROM (SELECT d1 dat FROM myData
  4           UNION
  5           SELECT d2 dat FROM myData)
  6  )
  7  SELECT s.seg_low, s.seg_high
  8    FROM segments s
  9    JOIN myData m ON s.seg_high > m.d1
 10                 AND s.seg_low < m.d2
 11   GROUP BY s.seg_low, s.seg_high
 12  HAVING COUNT(DISTINCT NAME) = 3;

SEG_LOW     SEG_HIGH
----------- -----------
13/03/2010  15/04/2010
10/05/2010  10/06/2010

I build all the possible successive date ranges and join this "calendar" with the sample data. This will list all ranges that have 3 values. You may need to merge the result if you add rows:

SQL> insert into mydata values ('B',date'2010-04-15',date'2010-04-16');

1 row inserted

SQL> WITH segments AS (
  2  SELECT dat seg_low, lead(dat) over(ORDER BY dat) seg_high
  3    FROM (SELECT d1 dat FROM myData
  4           UNION
  5           SELECT d2 dat FROM myData)
  6  )
  7  SELECT MIN(seg_low), MAX(seg_high)
  8    FROM (SELECT seg_low, seg_high, SUM(gap) over(ORDER BY seg_low) grp
  9             FROM (SELECT s.seg_low, s.seg_high,
 10                           CASE
 11                              WHEN s.seg_low
 12                                   = lag(s.seg_high) over(ORDER BY s.seg_low)
 13                              THEN 0
 14                              ELSE 1
 15                           END gap
 16                      FROM segments s
 17                      JOIN myData m ON s.seg_high > m.d1
 18                                   AND s.seg_low < m.d2
 19                     GROUP BY s.seg_low, s.seg_high
 20                    HAVING COUNT(DISTINCT NAME) = 3))
 21   GROUP BY grp;

MIN(SEG_LOW) MAX(SEG_HIGH)
------------ -------------
13/03/2010   16/04/2010
10/05/2010   10/06/2010
Vincent Malgrat
Exactly what i needed ! Merci encore Vincent ;)i suppose i could replace "3" with (select count(distinct NAME) from myData ) ?
guigui42
i just saw your EDIT, and i dont really understand what you mean by "if i add a row". In my data table, i might have unlimited names (e.g. A B C D E F G ... ) each might have duplicate rows (1, 2 or more lines with same name) with consecutive date ranges (no overlap, but might have gaps in between, like in my exemple B). So, will your second code snippet work for all these cases ? i m still trying to figure out how your analystics work in your query. Thanks again !
guigui42
@guigui42: the first query will be correct, but the result may contain consecutive intervals (for example in this case `2010-03-13 2010-04-15` and `2010-04-15 2010-04-16`). The second query will merge these intervals.
Vincent Malgrat