tags:

views:

67

answers:

4

I have a table columns Id and EmployeeID. The table data has the following peculiarity: in some parts (where the Id is consecutive), the same EmployeeID can sometimes be found, for example

Id | EmployeeID
---------------
1  |     1
2  |     1
3  |     2
4  |     5
5  |     1
6  |     1

I want to build a query to find blocks of data containing the same EmployeeID where the Id is consecutive (with a minimum value of x records). So far I came up with:

SELECT EmployeeID, MIN(Id), MAX(Id), COUNT(*)
FROM recs
GROUP BY EmployeeID
HAVING COUNT(*) > 5 AND
       MAX(Id) - MIN(Id) + 1 = COUNT(*)

This query will bring me some (but not all) blocks of data, as long as the same Employee can only be found in a single block. Can anyone come up with a solution which will provide all different blocks of data for each employee?

+1  A: 

Not the best solution, but it should work (for example, 3 consecutive ids):

SELECT Id, EmployeeID FROM
(
SELECT r.Id, r.EmployeeID, 
(SELECT COUNT(1) FROM recs r1 WHERE (r1.EmployeeID = r.EmployeeID AND r1.id = r.Id-1) AS c1,
(SELECT COUNT(1) FROM recs r2 WHERE (r2.EmployeeID = r.EmployeeID AND r2.id = r.Id-2) AS c2,
(SELECT COUNT(1) FROM recs r3 WHERE (r3.EmployeeID = r.EmployeeID AND r3.id = r.Id-3) AS c3
FROM recs r1) tab1
WHERE (tab1.c1+tab1.c2+tab1.c3 =3);

I suggested that Id is a primary(or a unique) key. If it's not, you should change a little each of sub-queries to something like SELECT IF(COUNT(1) >0,1,0) .....

a1ex07
+2  A: 

Join to the same table where table1.Id = table2.Id + 1 and table1.employeeid = table2.employeeid

Gabriel McAdams
This is the first step, but I still need to get blocks of data with at least 5 consecutive IDs. Your solution will fetch all consecutive rows.
Anax
A: 

Use a temp table for this. Use this solution:

SELECT EmployeeID, MIN(Id) AS Min, MAX(Id) AS Max, COUNT(*) AS Count
INTO #TempTable
FROM recs
GROUP BY EmployeeID

SELECT * FROM #TempTable WHERE
Count > 5 AND
       Max - Min + 1 = Count

EDITED ANSWER

please try this:

SELECT * FROM(    
SELECT EmployeeID, MIN(Id) AS min, MAX(Id) AS max, COUNT(*) AS count
    FROM recs
    GROUP BY EmployeeID) AS Table
    WHERE Table.count > 5 AND
           Table.max - Table.min + 1 = Table.count
masoud ramezani
I believe this will work exactly as the query I provided. It will only fetch blocks of data whenever an employee appears on a single block.
Anax
please see edited answer.
masoud ramezani
This still won't work. Try it on the provided data set (replace Table.count > 5 with Table.count >= 2) to see it for yourself. You're still approaching the problem in the same way.
Anax
A: 

Wow, this was a real brain teaser. I'm sure this has all kinds of holes but here's a possible solution. First our test data:

If Exists(Select 1 From INFORMATION_SCHEMA.TABLES Where TABLE_NAME = 'recs')
    DROP TABLE recs
GO
Create Table recs
(
    Id int not null
    , EmployeeId int not null
)
Insert recs(Id, EmployeeId) 
Values (1,1) ,(2,1) ,(3,1) ,(4,2) ,(5,5) ,(6,1) ,(7,1) ,(8,1) ,(10,1)   
    ,(11,1) ,(12,1) ,(13,2) ,(14,2) ,(15,2) ,(16,2)

Next, you will need a Tally or Numbers table that contains a sequence of numbers. I only put 500 elements in this one, but given the size of the data you may want more. The largest number in the Tally table should be bigger than the largest Id in the recs table.

Create Table dbo.Tally(Num int not null)
GO
;With Numbers As
    (
    Select ROW_NUMBER() OVER ( ORDER BY s1.object_id) As Num
    From sys.columns as s1
    )
Insert dbo.Tally(Num)
Select Num
From Numbers
Where Num < 500

Now for the actual solution. Basically, I used a series of CTEs to deduce the start and end point of the consecutive sequences.

; With 
    Employees As 
    (
    Select Distinct EmployeeId 
    From dbo.Recs
    )
    , SequenceGaps As
    (
    Select E.EmployeeId, T.Num, R1.Id 
    From dbo.Tally As T
        Cross Join Employees As E
        Left Join dbo.recs As R1
            On R1.EmployeeId = E.EmployeeId
                And R1.Id = T.Num
    Where T.Num <= (    
        Select Max(R3.Id) 
        From dbo.Recs As R3
            Where R3.EmployeeId = E.EmployeeId
            )
    )
    , EndIds As
    (
    Select S.EmployeeId
        , Case When S1.Id Is Null Then S.Id End As [End]
    From SequenceGaps As S
        Join SequenceGaps As S1
            On S1.EmployeeId = S.EmployeeId
                And S1.Num = (S.Num + 1) 
    Where S.Id Is Not Null
        And S1.Id Is Null
    Union All
    Select S.EmployeeId, Max( Id )
    From SequenceGaps As S
    Where S.Id Is Not Null
    Group By S.EmployeeId
    )
    , SequencedEndIds As
    (
    Select EmployeeId, [End]
        , ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY [End]) As SequenceNum
    From EndIds
    )
    , StartIds As
    (
    Select S.EmployeeId
        , Case When S1.Id Is Null Then S.Id End As [Start]
    From SequenceGaps As S
        Join SequenceGaps As S1
            On S1.EmployeeId = S.EmployeeId
                And S1.Num = (S.Num - 1)
    Where S.Id Is Not Null
        And S1.Id Is Null
    Union All
    Select S.EmployeeId, 1 
    From SequenceGaps As S
    Where S.Id = 1
    )
    , SequencedStartIds As
    (
    Select EmployeeId, [Start]
        , ROW_NUMBER() OVER (PARTITION BY EmployeeId ORDER BY [Start]) As SequenceNum
    From StartIds
    )
    , SequenceRanges As
    (
    Select S1.EmployeeId, Start, [End]
    From SequencedStartIds As S1
        Join SequencedEndIds As S2
            On S2.EmployeeId = S1.EmployeeId
                And S2.SequenceNum = S1.SequenceNum
    )
Select *
From SequenceGaps As SG
Where Exists(
        Select 1
        From SequenceRanges As SR
        Where SR.EmployeeId = SG.EmployeeId
            And SG.Id Between SR.Start And SR.[End]
            And ( SR.[End] - SR.[Start] + 1 ) >= @SequenceSize
        )

Using the final statement in the WHERE clause and @SequenceSize, you can control which sequences are returned.

Thomas