views:

25

answers:

3

I realize this question is general, but I think people with a lot of experience in event tracking will have good insight.

I have a website where I want to track file downloads by user. Two methods come to mind:

1) Create a model called AssetDownload and populate that with the data.
2) Create a model called Event or Activity and have that be a general model for tracking events.

I see the following pros and cons:

Method 1:

  • Pro - Better readability because model represents the event exactly
  • Pro - Will not need to refactor out into separate events if events become too different
  • Pro - Will not need to search event table on an extra parameter all the time ("event type")
  • Con - Potentially lots of tables
  • Con - Maybe not very DRY, although they could inherit from a general Event model that is not associated with a table, or I could implement an "acts as trackable" gem

Method 2:

  • Pro - Only one table for everything
  • Pro - DRY by default
  • Con - Table will potentially become very wide with columns that are only useful for a small subset of events
  • Con - May need to refactor out specific events in the future

I am tempted to go with method 1 because it's sort of a minimum viable product thought process. Just make what I need, I'll eventually add 5 models, 1 for each event type. If I add 20 models then I can refactor using a Single Table Inheritance scheme. But at least at that point, I will know what I am refactoring, whereas designing for the future involves some guesswork right now.

However, I am successfully using Method 2 on another website right now. Just want to see what other people are doing.

Update

I want to mention that the events I am logging will need to be accessed quite often. I will be providing a dashboard where users can view file downloads by user and by date. Please consider this if your answer involves using an Audit log model

+1  A: 

Method 2 is the right way of doing it. It's the way I've always done it, except I call it an Audit log and make it very generic and use it for lots of things.

Don't make the table wide instead have multiple entries if you need to make more than one type of entry.

Pseudo DDL - the types may vary.

CREATE TABLE Audit
   Type           # FK identifying the entry type
   DateTime       # entry time
   RequesterID    # FK identifying the user/process initiating the request
   Object         # Filename etc.
   ObjectClass    # FK defining type of the object 
   AccessType     # FK defining the type of access (download etc.)
   AccessOverride # FK set if accessed via impersonation
   Status         # FK result of operation - success / fail
 ;

NOTE: Originally this was loosely based on the VMS Audit log model.

Richard Harrison
Richard - the way I understand Audit logs is they are good for logging things that you want to run large reports on in the future. For example, I can log every time a user is created, updated, etc. I did not make this clear, but I am talking about logging things that need to be "reported on" all the time. For example, with file downloads, my users will have a dashboard to see how many times a file is downloaded and be able to filter the graph by date and by downloader. Do you still believe an Audit log model is the correct choice?
Tony
I still think that this is the right way. It doesn't matter whether it is called Audit or Activity, it's the structure that is important. With a decent set of indexes you will find that it is easy and reasonably quick to extract summaries. If you can make Object a FK reference it will be even better. The Type,ObjectClass, and AccessType fields are there to allow better extraction of smaller result sets (using indexes). It's worked fine for me when I've used it for similar reports to those you require. Using >50k audit performance was acceptable.
Richard Harrison
+1  A: 

I've generally used Method 2 in my designs over the years. The table width has never come up as an issue because it's generally been very string-heavy for event descriptions. I suppose this means that any audit review will involve a good bit of manual parsing from the auditor, but when you're at the point of an audit that kind of detective work is usually found in any design.

One way to address table width recently, for me, has been to store a lot of details about the event in an XML blob. MSSQL supports it well enough these days, and I can build any simple reporting tool to pull from it anyway. In terms of re-factoring out specific events, etc... this often comes down to just the reporting tools. I'm certainly no data model expert and I can't advise you on very large scale tables, but working with the database folks in the past they've always preferred method 2 as well and have build views/reports/etc. around that.

David
Wondering if it's just me that doesn't like storing XML blobs - it feels like a datamodelling deficiency because I can't really do anything (search, sort, index etc.) with the data in the blobs.
Richard Harrison
I've gotten flak for it before, it's definitely not always a good idea. It's just a useful option sometimes, when it's needed.
David
+1  A: 

There is a 3rd option:

event_header: 
  id
  date
  time
  type
  code
  ... 

event_type_data: 
  PK(id)
  FK(event_id)
  special_field1
  special_field2 

Your download query knows that the event type is say 4, so do a join on the event_data table

select ev.*, evd.* from event_header ev, event_type_data evd where evd.event_id = ev.id and ev.type = 4 

Overcomplication? Perhaps. Slower? Probably. Confusing for future developers? Yes. Viable? Certainly.

Me, I'd probably go with method 2 and have a text field for special data in a JSON or XML format, or simply "key:value,key:value"

Chris Kaminski