views:

176

answers:

3

Hi,

I have a table in which there are records about works accesing entrance doors.

DECLARE @doorStatistics TABLE
( id INT IDENTITY,
[user] VARCHAR(250),
accessDate DATETIME,
accessType VARCHAR(5)
)

Sample records:

INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:02:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:12:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:22:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:32:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:37:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:42:43.000','IN')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('John Wayne','2009-09-01 07:48:43.000','OUT')
INSERT INTO @doorStatistics([user],accessDate,accessType) VALUES ('Bruce Willis','2009-09-01 07:52:43.000','OUT')

What I want to do is a query, which gives me following result (based on above example):

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-01 | 07:02:43 | 07:48:43 |
| Bruce Willis | 2009-09-01 | 07:12:43 | 07:22:43 |
| John Wayne   | 2009-09-02 | 07:37:43 | 07:48:43 |
| Bruce Willis | 2009-09-02 | 07:42:43 | 07:52:43 |

Query I made is the following:

SELECT [user], accessDate AS [in date], 
    (SELECT MIN(accessDate) 
        FROM @doorStatistics ds2 
        WHERE accessType = 'OUT' 
            AND ds2.accessDate > ds.accessDate 
            AND ds.[user] = ds2.[user]) AS [out date] 
FROM @doorStatistics ds 
WHERE accessType = 'IN'

But it is not good, because when user forget to register his/her entrance it will produce for example something like this:

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | 07:02:43 | 09:26:43 |

While it should be

| user         | date       | inHour   | outHour  |
|--------------|------------|----------|----------|
| John Wayne   | 2009-09-02 | 07:02:43 | 07:48:43 |
| John Wayne   | 2009-09-02 | NULL     | 09:26:43 |

The 2nd reason the query is not good is the performance. I have over 200 000 records and SELECT for every row slows down the query.

The possible solution may be to join two tables

SELECT * FROM @doorStatistics WHERE accessType = 'IN'

with

SELECT * FROM @doorStatistics WHERE accessType = 'OUT'

but I have no idea what conditions to put to get correct date. Maybe some MAX or MIN functions can be put there but I have no idea.

I don't want to create temporary table and use cursors.

+1  A: 

When designing databases for temporal events that have duration, it's better practice to put the "IN" time and the "OUT" time on the same row.

All the queries you need to do are so much easier that way.

See "Joe Celko's SQL Programming Style" where he talks about temporal cohesion on pages 48 and 154.

Bill Karwin
The problem is that I didn't design this table and I cannot change its structure. However, in this case is not good solution, because every row in the table hold the data from one finger-reader (sorry, for my english, but I think you know what I mean). Some doors have two (inner and outer) readers, some only one (outer). But anyway thanks for link to the book.
Lukasz Lysik
+1  A: 

To improve performance on the structure level:

  • i suggest that you rename your accessDate column to accessDateTime
  • then you create a PERSISTENT computed column based on your accessDateTime (shown below). Then the index you need would include only the accessDate column which you will use for exact comparison together with the user
  • make sure you have proper indices on the table (from the code below you would probably need one on "user", "accessDate" and including "accessType"

accessDate column definition:

accessDate AS CONVERT(SMALLDATETIME, CONVERT(CHAR(8), accessDateTime, 112), 112) PERSISTED

Now, given that you have done it and you have SQL-2005+, this terribly long query should do the job:

WITH MatchIN (in_id, out_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime >= s.accessDateTime
                            AND z.accessDateTime <= x.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    WHERE       s.accessType = 'IN'
)
,    MatchOUT (out_id, in_id)
AS (SELECT      s.id, CASE WHEN COALESCE(y.id, s.id) = s.id THEN x.id ELSE NULL END
    FROM        @doorStatistics s
    LEFT JOIN   @doorStatistics x
            ON  x.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'IN'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                        ORDER BY z.accessDateTime DESC
                        )
    LEFT JOIN   @doorStatistics y
            ON  y.id = (SELECT  TOP 1 z.id
                        FROM    @doorStatistics z
                        WHERE   z."user" = s."user"
                            AND z.accessType = 'OUT'
                            AND z.accessDate =  s.accessDate
                            AND z.accessDateTime <= s.accessDateTime
                            AND z.accessDateTime >= x.accessDateTime
                        ORDER BY z.accessDateTime ASC
                        )
    WHERE       s.accessType = 'OUT'
)

SELECT  COALESCE(i."user", o."user") AS "user",
        COALESCE(i.accessDate, o.accessDate) AS "date",
        CONVERT(CHAR(10), i.accessDateTime, 108) AS "inHour",
        CONVERT(CHAR(10), o.accessDateTime, 108) AS "outHour"
FROM   (SELECT in_id, out_id FROM MatchIN
        UNION -- this will eliminate duplicates as the same time
        SELECT in_id, out_id FROM MatchOUT
        ) x
LEFT JOIN   @doorStatistics i
        ON  i.id = x.in_id
LEFT JOIN   @doorStatistics o
        ON  o.id = x.out_id
ORDER BY    "user", "date", "inHour"

To test for handling of missing rows, just comment out some of your INSERT statements of test data.

van
change: fixed the query, removed time-only computed column, tested - works
van
+1  A: 

You need to select the minimal OUT record for each IN record for a given user, after ensuring there is no intervening IN record (which would correspond to someone getting IN twice without ever leaving the building). That requires some modestly tricky SQL (a NOT EXISTS clause, for example). So, you are going to have a self-join on the table, plus a NOT EXISTS sub-query on the same table. Just make sure you alias all references to the table sensibly.

Jonathan Leffler