ansaurus

Question

SQL standard select current records from an audit log question

Answer 1

+1 A:

Another (better?) method to keep audit history is to use a 'startDate' and 'endDate' column rather than an auditDateTime and AuditCode column. This is often the approach in tracking Type 2 changes (new versions of a row) in data warehouses.

This lets you more directly select the current rows (WHERE endDate is NULL), and you will not need to treat updates differently than inserts or deletes. You simply have three cases:

Insert: copy the full row along with a start date and NULL end date
Delete: set the End Date of the existing current row (endDate is NULL)
Update: do a Delete then Insert

Your select would simply be:

select * from AuditTable where endDate is NULL

Anyway, here's my query for your existing schema:

declare @from datetime
declare @to datetime

select b.* from (
  select
    customerId
    max(auditdatetime) 'auditDateTime'
  from
    AuditTable
  where
    auditcode in ('I', 'U')
    and auditdatetime between @from and @to
  group by customerId
  having 
    /* rely on "current" being defined as INSERTS > DELETES */
    sum(case when auditcode = 'I' then 1 else 0 end) > 
    sum(case when auditcode = 'D' then 1 else 0 end)
) a
cross apply(
  select top 1 customerId, name, address, auditdateTime
  from AuditTable
  where auditdatetime = a.auditdatetime and customerId = a.customerId
) b

References

A cribsheet for data warehouses, but has a good section on type 2 changes (what you want to track)

MSDN page on data warehousing

Jeff Meatball Yang 2009-07-17 22:58:56

If you are looking at versioned data in general, you are 100% right. Picking the most recent row, where the topmost row is "not-expired", then it is best that all versioned rows have fromdate/todate and a SARG-able query is possible upon such an index.Otherwise a self-join and double-scan at the minimum is generally required (and often a tablescan and a half if it is "get all maximum versions of the entity rows from the table"

polyglot 2009-07-17 23:34:13

The first reference to "use a 'startDate' and 'endDate' column rather than an auditDateTime and AuditCode column", where would I read about this?

Dr. Zim 2009-07-20 13:50:05

That's a technique borrowed from data warehousing. See the references, and look for "Type 2" within the text.

Jeff Meatball Yang 2009-07-20 14:32:09

TYVM. I read the article. I have been a snowflake and didn't know it. LOL. Anyway, I think I finally understand your Insert, Update, and Delete comments above to apply to the Audit table (yea it took a while). Very good idea because I could have three transactions running from it, a delete from where exists, and an insert where not exists, and an update join. (I hope I got it right). Thanks again.

Dr. Zim 2009-07-21 14:31:35

Answer 2

+1 A:

Another approach is using a sub select

select a.ID
       , a.CustomerID 
       , a.Name
       , a.Address
       , a.AuditDateTime
       , a.AuditCode
from   myauditlogtable a,
       (select s.id as maxid,max(s.AuditDateTime) 
                 from myauditlogtable as s 
                 group by maxid) 
        as subq
where subq.maxid=a.id;

nos 2009-07-17 23:04:08

Answer 3

+2 A:

Ok, a couple of things for audit log tables.

For most applications, we want audit tables to be extremely quick on insertion.

If the audit log is truly for diagnostic or for very irregular audit reasons, then the quickest insertion criteria is to make the table physically ordered upon insertion time.

And this means to put the audit time as the first column of the clustered index, e.g.

create unique clustered index idx_mytable on mytable(AuditDateTime, ID)

This will allow for extremely efficient select queries upon AuditDateTime O(log n), and O(1) insertions.

If you wish to look up your audit table on a per CustomerID basis, then you will need to compromise.

You may add a nonclustered index upon (CustomerID, AuditDateTime), which will allow for O(log n) lookup of per-customer audit history, however the cost will be the maintenance of that nonclustered index upon insertion - that maintenance will be O(log n) conversely.

However that insertion time penalty may be preferable to the table scan (that is, O(n) time complexity cost) that you will need to pay if you don't have an index on CustomerID and this is a regular query that is performed. An O(n) lookup which locks the table for the writing process for an irregular query may block up writers, so it is sometimes in writers' interests to be slightly slower if it guarantees that readers aren't going to be blocking their commits, because readers need to table scan because of a lack of a good index to support them....

Addition: if you are looking to restrict to a given timeframe, the most important thing first of all is the index upon AuditDateTime. And make it clustered as you are inserting in AuditDateTime order. This is the biggest thing you can do to make your query efficient from the start.

Next, if you are looking for the most recent update for all CustomerID's within a given timespan, well thereafter a full scan of the data, restricted by insertion date, is required.

You will need to do a subquery upon your audit table, between the range,

select CustomerID, max(AuditDateTime) MaxAuditDateTime 
from AuditTrail 
where AuditDateTime >= @begin and Audit DateTime <= @end

and then incorporate that into your select query proper, eg.

select AuditTrail.* from AuditTrail
inner join 
    (select CustomerID, max(AuditDateTime) MaxAuditDateTime 
     from AuditTrail 
     where AuditDateTime >= @begin and Audit DateTime <= @end
    ) filtration
    on filtration.CustomerID = AuditTrail.CustomerID and 
       filtration.AuditDateTime = AuditTrail.AuditDateTime

polyglot 2009-07-17 23:05:08

Answer 4

A:

start and end time? e.g as in between 1am to 3am
or start and end date time? e.g as in 2009-07-17 13:36 to 2009-07-18 13:36

MakkyNZ 2009-07-17 23:05:43

An entire datetime start to an entire datetime end, rather than repeating. I need to select all records up to when I want to process them, like on the hour. They continuously change and I want to process up to a point without stopping their process.

Dr. Zim 2009-07-20 04:12:56

ansaurus

tags:

views:

answers:

SQL standard select current records from an audit log question

related questions