views:

429

answers:

5

We have a web application that is built on top of a SQL database. Several different types of objects can have comments added to them, and some of these objects need field-level tracking, similar to how field changes are tracked on most issue-tracking systems (such as status, assignment, priority). We'd like to show who the change is by, what the previous value was, and what the new value is.

At a pure design level, it would be most straightforward to track each change from any object in a generic table, with columns for the object type, object primary key, primary key of the user that made the change, the field name, and the old and new values. In our case, these would also optionally have a comment ID if the user entered a comment when making the changes.

However, with how quickly this data can grow, is this the best architecture? What are some methods commonly employed to add this type of functionality to an already large-scale application?

[Edit] I'm starting a bounty on this question mainly because I'd like to find out in particular what is the best architecture in terms of handling scale very well. Tom H.'s answer is informative, but the recommended solution seems to be fairly size-inefficient (a new row for every new state of an object, even if many columns did not change) and not possible given the requirement that we must be able to track changes to user-created fields as well. In particular, I'm likely to accept an answer that can explain how a common issue-tracking system (JIRA or similar) has implemented this.

+2  A: 

There are several options available to you for this. You could have audit tables which basically mirror the base tables but also include a change date/time, change type and user. These can be updated through a trigger. This solution is typically better for behind the scenes auditing (IMO) though, rather than to solve an application-specific requirement.

The second option is as you've described. You can have a generic table that holds each individual change with a type code to show which attribute was changed. I personally don't like this solution as it prevents the use of check constraints on the columns and can also prevent foreign key constraints.

The third option (which would be my initial choice with the information given) would be to have a separate historical change table which is updated through the application and includes the PK for each table as well as the column(s) which you would be tracking. It's slightly different from the first option in that the application would be responsible for updating the table as needed. I prefer this over the first option in your case because you really have a business requirement that your trying to solve, not a back-end technical requirement like auditing. By putting the logic in the application you have a bit more flexibility. Maybe some changes you don't want to track because they're maintenance updates, etc.

With the third option you can either have the "current" data in the base table or you can have each column that is kept historically in the historical table only. You would then need to look at the latest row to get the current state for the object. I prefer that because it avoids the problem of duplicate data in your database or having to look at multiple tables for the same data.

So, you might have:

Problem_Ticket (ticket_id, ticket_name) Problem_Ticket_History (ticket_id, change_datetime, description, comment, username)

Alternatively, you could use:

Problem_Ticket (ticket_id, ticket_name) Problem_Ticket_Comments (ticket_id, change_datetime, comment, username) Problem_Ticket_Statuses (ticket_id, change_datetime, status_id, username)

Tom H.
Or you could use the audit tables to capture most of the information (thus providing a way to track changes made from anywhere) and a change_comments related table to capture comments on changes from the GUI. Just depends on how critical auditing the records is.
HLGEM
Definitely. If auditing were an additional requirement then you could combine the two.
Tom H.
@HLGEM -All interactivity, including that from the GUI, passes through a single API, so for us, application triggers are as reliable as database triggers. However, in general your design sounds like an elegant solution. @Tom H. - How do you deal with scale in your suggested design? I also probably should have mentioned that these objects can have user-added properties, which we want to be able to track as well. We are already going to have a lot of data, so scale is very important to us.
Renesis
This solution is known to Renesis. Main problem is "how quickly this data can grow". So solution would be to find a way to limit amount of data that is stored.
Gladwin Burboz
+1  A: 

It depends on your exact requirements, and this might not be for you, but for general auditing in the database with triggers (so front-end and even the SP interface layer don't matter), we use AutoAudit, and it works very well.

Cade Roux
I don't know that I'd say that the front-end and interface don't matter. If the front-end is using a single user account to connect to the database then you'll need to build in something to accurately capture the user actually making the changes rather than just the application account. I haven't used AutoAudit, but I can't imagine it getting around this issue without at least some kind of change to the front-end.
Tom H.
That's correct - you can only use information available on the connection in the generated triggers. If it's not on the connection string or in the session or in the new rows coming into the table, you may not have all the information you need - but that's true in all designs. You can work around that somewhat by using SET CONTEXT_INFO at the beginning of every connection.
Cade Roux
In our case we have a single user connection to the database. However, I'm interested in general best practices, so thanks for your suggestion... +1
Renesis
+1  A: 

I'm not sure about the "issue tracking" specific approach, but I wouldn't say there is one ultimate way to do this. There are a number of options to accomplish it, each have their benefits and negatives as illustrated here.

I personally would just create one table that has some meta data columns about the change and a column that stores xml of the serialized version of the old object or whatever you care about. That way if you wanted to show the history of the object you just get all the old versions and the re-hydrate them and your done. One table to rule them all.

One often overlooked solution would be to use Change Data Capture. This might give you more space savings/performance if you really are concerned.

Good luck.

Greg Roberts
A: 

Here is the solution I would recommend to attain your objective.

Design your auditing model as shown below.



  ----------------  1      *  ------------                       
 | AuditEventType |----------| AuditEvent |                      
  ----------------            ------------                       
                                | 1    | 1                       
                                |      |                         
               -----------------        -------------            
              | 0,1                                  | +         
    -------------------                       ----------------   
   | AuditEventComment |                     | AuditDataTable |  
    -------------------                       ----------------   
                                                     | 1         
                                                     |           
                                                     |           
                                                     | +         
          -----------------  +             1  --------------     
         | AuditDataColumn |------------------| AuditDataRow |   
          -----------------                   --------------     



.

AuditEventType

Contains list of all possible events type in system and generic description for same.

.

AuditEvent

Contains information about particular even that triggerd this action.

.

AuditEventComment

Contains optional custom user comment about the audit event. Comments can be really huge so better store them in CLOB.

.

AuditDataTable

Contains list of one or more tables that were impacted by respective AuditEvent

.

AuditDataRow

Contains list of one or more identifying rows in respective AuditDataTable that was were impacted by respective AuditEvent

.

AuditDataColumn

Contains list of zero or more columns of respective AuditDataRow whose values were changed with it's previous and current values.

.

AuditBuilder

Implement AuditBuilder (Builder pattern). Instantiate it at begining of event and make it available in request context or pass it along with other DTO's. Each time anywhere in your code you make changes to your data, invoke appropriate call on AuditBuilder to notify it about the change. At the end, invoke build() on AuditBuilder to form above structure and then persist it to database.

Make sure all your activity for the event is in a single DB transaction along with persistence of audit data.

Gladwin Burboz
A: 

I don't understand the actual usage scenarios for the audited data, though... do you need to just keep track of the changes? Will you need to "rollback" some of the changes? How frequent (and flexible) you want the audit log report/lookup to be?

Personally I'd investigate something like that:

Create AuditTable. This has an ID, a version number, a user id and a clob field.

  • When Object #768795 is created, add a row in AuditTable, with values: Id=#768795 Version:0 User: (Id of the user who created the new object) clob: an xml representation of the whole object. (if space is a problem, and access to this table is not frequent, you could use a blob and "zip" the xml representation on the fly).

Every time you change something, create a new version, and store the whole object "serialized" as an XML In case you need to create an audit log you have all you need, and can use simple "text compare" tools or libraries to see what changed in time (a bit like Wikipedia does).

If you want to track only a subset of fields either because the rest is immutable, non significant or you are desperate for speed/space, just serialize the subset you care about.

p.marino
Good questions - No, we don't want to rollback the changes. We do, however, want to be able to output a history of the object, like common issue tracking systems do. We want that history to look something like: "User X changed Field Y from value A to value Z"
Renesis