tags:

views:

149

answers:

3
site_id | start_date | end_date
      1 | oct  1, 08 | oct  2, 08
      1 | oct  2, 08 | oct  3, 08
 ...
      1 | oct 30, 08 | oct 31, 08
      2 | oct  1, 08 | oct  2, 08
      2 | oct  2, 08 | oct  3, 08
 ...
      2 | oct 30, 08 | oct 31, 08

I have a table that contains 1 record per site per day of the month (per month of the year). I need to be able to determine if a site for a given month has at least 15 contiguous records, and I need to know the start and end date of that series of contiguous days. I can do this in a stored procedure, but I was hoping this could be accomplished in a single query. I am dealing with a fairly large dataset, at least 30 million records per month.

Example Results:

site_id | contiguous_start_date | contiguous_end_date
      1 | oct 5, 2008           | oct 20, 2008
      2 | oct 10                | oct 30, 2008
      3 | oct 1                 | oct 31, 2008

thanks for your help!

+1  A: 

This is definitely very possible. I solved a similar problem in SQL Server a couple of months ago. I know nothing of Oracle syntax, so I'm afraid I can't convert if for you, but if you're solid with Oracle, this should be enough to get you there.

Aaron Alton
A: 

Your database structure is not suitable for the business logic you have:

  • end_date is always the next day after start_date then why you have to store it in the db?
  • I see that in the data sample you gave there is no spaces in dates range for a single site. That means that don't have to store the all dates period just start and stop date.

30 millions records per month is really table for the query you have to write. Do a structural refactoring of this table is my advice.

Bogdan Gusiev
+2  A: 

Here is an example of how to do such a query:

SQL> create table t (site_id,start_date,end_date)
  2  as
  3  select 1, date '2008-10-01', date '2008-10-02' from dual union all
  4  select 1, date '2008-10-02', date '2008-10-03' from dual union all
  5  select 1, date '2008-10-03', date '2008-10-30' from dual union all
  6  select 1, date '2008-10-30', date '2008-10-31' from dual union all
  7  select 2, date '2008-10-01', date '2008-10-02' from dual union all
  8  select 2, date '2008-10-02', date '2008-10-03' from dual union all
  9  select 2, date '2008-10-03', date '2008-10-04' from dual union all
 10  select 2, date '2008-10-04', date '2008-10-05' from dual union all
 11  select 2, date '2008-10-05', date '2008-10-06' from dual union all
 12  select 2, date '2008-10-06', date '2008-10-07' from dual union all
 13  select 2, date '2008-10-07', date '2008-10-08' from dual union all
 14  select 2, date '2008-10-08', date '2008-10-09' from dual union all
 15  select 2, date '2008-10-09', date '2008-10-10' from dual union all
 16  select 2, date '2008-10-10', date '2008-10-11' from dual union all
 17  select 2, date '2008-10-11', date '2008-10-12' from dual union all
 18  select 2, date '2008-10-12', date '2008-10-13' from dual union all
 19  select 2, date '2008-10-13', date '2008-10-14' from dual union all
 20  select 2, date '2008-10-14', date '2008-10-15' from dual union all
 21  select 2, date '2008-10-15', date '2008-10-16' from dual union all
 22  select 2, date '2008-10-16', date '2008-10-17' from dual union all
 23  select 2, date '2008-10-17', date '2008-10-18' from dual union all
 24  select 2, date '2008-10-18', date '2008-10-19' from dual union all
 25  select 2, date '2008-10-19', date '2008-10-20' from dual union all
 26  select 3, date '2008-10-01', date '2008-10-02' from dual union all
 27  select 3, date '2008-10-02', date '2008-10-03' from dual union all
 28  select 3, date '2008-10-03', date '2008-10-04' from dual union all
 29  select 3, date '2008-10-04', date '2008-10-05' from dual union all
 30  select 3, date '2008-10-05', date '2008-10-06' from dual union all
 31  select 3, date '2008-10-06', date '2008-10-07' from dual union all
 32  select 3, date '2008-10-07', date '2008-10-08' from dual union all
 33  select 3, date '2008-10-08', date '2008-10-09' from dual union all
 34  select 3, date '2008-10-09', date '2008-10-10' from dual union all
 35  select 3, date '2008-10-30', date '2008-10-31' from dual
 36  /

Tabel is aangemaakt.

And then the query:

SQL> select site_id
  2       , min(start_date) contiguous_start_date
  3       , max(end_date) contiguous_end_date
  4       , count(*) number_of_contiguous_records
  5    from ( select site_id
  6                , start_date
  7                , end_date
  8                , max(rn) over (partition by site_id order by start_date) maxrn
  9             from ( select site_id
 10                         , start_date
 11                         , end_date
 12                         , case lag(end_date) over (partition by site_id order by start_date)
 13                             when start_date then null
 14                             else rownum
 15                           end rn
 16                      from t
 17                  )
 18          )
 19   group by site_id
 20       , maxrn
 21   order by site_id
 22       , contiguous_start_date
 23  /

And the results:

   SITE_ID CONTIGUOUS_START_DA CONTIGUOUS_END_DATE NUMBER_OF_CONTIGUOUS_RECORDS
---------- ------------------- ------------------- ----------------------------
         1 01-10-2008 00:00:00 31-10-2008 00:00:00                            4
         2 01-10-2008 00:00:00 20-10-2008 00:00:00                           19
         3 01-10-2008 00:00:00 10-10-2008 00:00:00                            9
         3 30-10-2008 00:00:00 31-10-2008 00:00:00                            1

4 rijen zijn geselecteerd.

Regards, Rob.

Rob van Wijk
Clever approach to solving problem. Thanks.
BigCanOfTuna
+1 Cool, lag(end_date) = start_date.
Andomar