views:

387

answers:

4

I am having a table like this

ColumnId Intime                   Outtime
1        01/02/2009 10.00.000    01/02/2009 20.00.0000  
2        01/02/2009 2.00.000    01/02/2009 2.00.0000 
3        01/02/2009 2.00.000    01/02/2009 5.00.0000 
4        01/02/2009 3.3.0.000    01/02/2009 5.00.0000 
5        01/02/2009 10.00.000    01/02/2009 22.00.0000 
6        01/02/2009 3.00.000    01/02/2009 4.00.0000 

I am having columns and values like this. I like to find the overlapping records and how many overlapping records for the particular date. Overlapping for time from 1-24 in a day.

Note:- My table has millions of records.

for example in first value login an 10 and logged out 20. and in 5the record login at 10 and logged out at 22 so 5th overlapped with first. No Indices available in the table.

Please get me the answer for my query.

I need the query to execute in SQL Server 2005

+3  A: 

Out of the tip of my head, and assuming index on both columns, you could use something like this:

SELECT a.ColumnId
      ,a.InTime
      ,a.OutTime
      ,b.ColumnId AS OverlappingId
      ,b.InTime   AS OverlappingInTime
      ,b.OutTime  AS OverlappingOutTime
  FROM TimeTable  AS a
  JOIN TimeTable  AS b ON ((a.InTime  BETWEEN b.InTime AND b.OutTime)
                       OR  (a.OutTime BETWEEN b.InTime AND b.OutTime) 
                       OR  (a.InTime < b.InTime AND a.OutIme > b.OutTime))
                      AND  (a.ColumnId != b.ColumnId)

But I'm really not sure about the performance this query would have in a table with millions of records as you mention.

Edited to Add, and edited yet again:

After the comments of Vadim K., I noticed that the query I had wrote previously were missing a case when the overlapping were total, that is one range cover the entirely another one. Above is my revised query, and below the original one:

SELECT a.ColumnId 
      ,a.InTime 
      ,a.OutTime 
      ,b.ColumnId AS OverlappingId 
      ,b.InTime   AS OverlappingInTime 
      ,b.OutTime  AS OverlappingOutTime 
  FROM TimeTable  AS a 
  JOIN TimeTable  AS b ON ((a.InTime  BETWEEN b.InTime AND b.OutTime) 
                       OR  (a.OutTime BETWEEN b.InTime AND b.OutTime)) 
                      AND  (a.ColumnId != b.ColumnId) 

Using the question initial data for the a test run:

+--------+------------------+------------------+
|ColumnId| InTime           | OutTime          |
+--------+------------------+------------------+
|      1 | 01/02/2009 10:00 | 01/02/2009 20:00 |   
|      2 | 01/02/2009  2:00 | 01/02/2009  2:00 |  
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |  
|      4 | 01/02/2009  3:03 | 01/02/2009  5:00 |  
|      5 | 01/02/2009 10:00 | 01/02/2009 22:00 |  
|      6 | 01/02/2009  3:00 | 01/02/2009  4:00 |  
+--------+------------------+------------------+

Running the original query we have the following result:

+--------+------------------+------------------+-------------+
|ColumnId| InTime           | OutTime          |OverlappingId|
+--------+------------------+------------------+-------------+
|      1 | 01/02/2009 10:00 | 01/02/2009 20:00 |           5 |
|      2 | 01/02/2009  2:00 | 01/02/2009  2:00 |           3 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           2 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           4 |
|      4 | 01/02/2009  3:03 | 01/02/2009  5:00 |           3 |
|      4 | 01/02/2009  3:03 | 01/02/2009  5:00 |           6 |
|      5 | 01/02/2009 10:00 | 01/02/2009 22:00 |           1 |
|      6 | 01/02/2009  3:00 | 01/02/2009  4:00 |           3 |
|      6 | 01/02/2009  3:00 | 01/02/2009  4:00 |           4 |
+--------+------------------+------------------+-------------+

Running the updated query we have the following result:

+--------+------------------+------------------+-------------+
|ColumnId| InTime           | OutTime          |OverlappingId|
+--------+------------------+------------------+-------------+
|      1 | 01/02/2009 10:00 | 01/02/2009 20:00 |           5 |
|      2 | 01/02/2009  2:00 | 01/02/2009  2:00 |           3 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           2 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           4 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           6 | << missing row
|      4 | 01/02/2009  3:03 | 01/02/2009  5:00 |           3 |
|      4 | 01/02/2009  3:03 | 01/02/2009  5:00 |           6 |
|      5 | 01/02/2009 10:00 | 01/02/2009 22:00 |           1 |
|      6 | 01/02/2009  3:00 | 01/02/2009  4:00 |           3 |
|      6 | 01/02/2009  3:00 | 01/02/2009  4:00 |           4 |
+--------+------------------+------------------+-------------+

Yes, there are some IDs that are repeated, but that's because they overlaps with different records.

The question also asks for the number of overlapping rows. I'm not sure, and the question isn't clear enough, if it wants the number of overlapping rows regarding the original table.

Some people have suggested using the a.ColumnId < b.ColumnId or a.ColumnId > b.ColumnId in order to avoid repetition, however, it still doesn't work because if we did the first comparison we'd get the following result:

+--------+------------------+------------------+-------------+
|ColumnId| InTime           | OutTime          |OverlappingId|
+--------+------------------+------------------+-------------+
|      1 | 01/02/2009 10:00 | 01/02/2009 20:00 |           5 |
|      2 | 01/02/2009  2:00 | 01/02/2009  2:00 |           3 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           4 |
|      3 | 01/02/2009  2:00 | 01/02/2009  5:00 |           6 | 
|      4 | 01/02/2009  3:03 | 01/02/2009  5:00 |           6 |
+--------+------------------+------------------+-------------+

If you notice all the 6 rows of the sample data are referenced in the results, although it has only 5 lines. I believe that, with this data, where all the rows are overlapping each other at one point or another, the number of overlapping rows is 6.

And in order to get this result, the query below could be used:

SELECT COUNT (DISTINCT a.ColumnId)
  FROM TimeTable  AS a
  JOIN TimeTable  AS b ON ((a.InTime  BETWEEN b.InTime AND b.OutTime)
                       OR  (a.OutTime BETWEEN b.InTime AND b.OutTime) 
                       OR  (a.InTime < b.InTime AND a.OutIme > b.OutTime))
                      AND  (a.ColumnId != b.ColumnId)

Which returns the count of all 6 rows.

Paulo Santos
Does not work. Try a.InTime=2am, a.OutTime=5am, b.InTime=3am, b.OutTime=4am. Also, returns two rows for each overlap.
Vadim K.
Correction: returns two rows for some overlaps, not all.
Vadim K.
@Vladim K. Before you tell something doesn't work, try running some test.
Paulo Santos
Paulo, I did test and that's why I left the comment. Overall, there will be twice as many comparisons performed than necessary (the join condition needs to be either `a.ColumnId < b.ColumnId` or `a.ColumnId > b.ColumnId` to address this). The overlap test misses the case when one range fits completely within another range. As coded, this results in two returned rows per partial overlap and one returned row per full overlap. Without getting hostile, consider in what way this result is correct and desirable.
Vadim K.
@Vladmim K. I didn't understand here: "The overlap test misses the case when one range fits completely within another range". Why? If the start of a range falls within the boundaries of another range the comparison will be validated AND the row will be returned. But I still doesn't see the problem with the ColumnId comparison. Because, for instance, the id #6 has two overlaps one with #3 and another with #4... DANG! I just noticed that the #3 should report rows for #2, #4 AND #6. It completely missed the last one! But I don't see anything wrong about reporting 2 or 3 lines per overlapping.
Paulo Santos
OK, I think now I've got it working right!Updating answer.
Paulo Santos
A: 
Select T1.*,T2.*
From Table1 T1
Inner Join Table1 T2 ON ((T1.InTime >= T2.InTime AND T1.OutTime > T2.InTime)
                         OR (T2.InTime >= T1.InTime AND T2.OutTime > T1.InTime)) 
                        AND (T1.ColumnId != T2.ColumnId)
Nitin Midha
Thanks ..... Edited my code to reflect the same.
Nitin Midha
Returns two rows for each overlap.
Vadim K.
A: 
SELECT  T1. ColumnId, T1.Intime, T1.OutTime
FROM    T1, T2
WHERE    1 =1 
AND (    T2.Intime  BETWEEN T1.Intime AND T1.OutTime
      OR T2.OutTime BETWEEN T1.Intime AND T1.OutTime )
        AND T1.ColumnId <> T2.ColumnId
John G
Overlap check is wrong. Try T1.InTime=3am T1.OutTime=4am, T2.InTime=2am, T2.OutTime=5am.
Vadim K.
+1  A: 

Test the solutions carefully, I've found that the answers posted so far either get the overlap check wrong or return too many results (two rows for each overlap).

select
    aa.ColumnId as ColumnIdA, aa.InTime as InTimeA, aa.OutTime as OutTimeA,
    bb.ColumnId as ColumnIdB, bb.InTime as InTimeB, bb.OutTime as OutTimeB
from
    MyTable aa
    join
    MyTable bb on aa.ColumnId < bb.ColumnId
where
    aa.InTime < bb.OutTime
    and
    aa.OutTime > bb.InTime

One must be careful in defining "overlap". I assume that if the first period is 3am-to-4am and the second period is 4am-to-5am that these ranges don't overlap. If one truly wishes this case to be considered an overlap, change the <-to-<= and the >-to->= in the where clause.

Performance is proportional to the square of the number of rows. Faster solutions are possible for large data sets, but are much more involved than this one.

Vadim K.