views:

60

answers:

3

What is the best way to store occurrences of an event in a database so you can quickly pull reports on it? ie (total number of occurrences, number of occurrences between date range).

right now I have two database tables, one which holds all individual timestamps of the event - so I can query on a date range, and one which holds a total count so I can quickly pull that number for a tally

Table 1:

Event | Total_Count
------+------------
bar   |  1
foo   |  3

Table 2:

Event | Timestamp
------+----------
bar   | 1/1/2010
foo   | 1/1/2010
foo   | 1/2/2010
foo   | 1/2/2010

Is there a better approach to this problem? I'm thinking of converting Table 2, to hold date tallies, it should be more efficient, since my date range queries are only done on whole dates, not a timestamp (1/1/2010 vs 1/1/2010 00:01:12) ie:

Updated Table 2

Event |   Date   | Total_Count
------+----------+------------
bar   | 1/1/2010 |  1
foo   | 1/1/2010 |  1
foo   | 1/2/2010 |  2

Perhaps theres an even smarter way to tackle this problem? any ideas?

+1  A: 

I would just have the one table with the timestamp of your event(s). Then your reporting is simply setting up your where clause correctly...

Or am I missing something in your question?

AllenG
one table wont work. on a high traffic site, there will be millions of records and querying for totals, even with some really good indexes, will take way to long.
veilig
+1  A: 

Your approach seems good. I see table 2 more as a detail table, while table 1 as a summary table. For the most part, you would be doing inserts only to table 2, and inserts and updates on table 1.

The updated table 2 may not give you much additional benefit. However, you should consider it if aggregations by day is most important to you.

You may consider adding more attributes (columns) to the tables. For example, you could add a first_date, and last date to table 1.

bobs
I like your additional ideas, could potentially come in handy.I am considering the updated table 2, b/c that table will hold millions of records otherwise. so doing a query on a range can still take a long time, but if I know the total amounts for each day I'm thinking the queries could happen much faster
veilig
That's great. Going with the updated table 2 is good then. Performance is always a good reason for particular designs.
bobs
+1  A: 

Seems like you don't really have any requirements:

Changing from timestamp to just the date portion is a big deal. You don't ever want to do a time-of-day analysis? like what's the best time of day to do maintenance if that stops "foo" from happening.

And you're not worried about size? You say you have millions of records (like that's a lot) and then you extend every single row by an extra column. One column isn't a lot until the row count skyrockets and then you really have to think about each column.

So to get the sum of event for the last 3 days you'd rather do this

SELECT SUM(totcnt) FROM (
SELECT MAX(Total_count) as totcnt from table where date = today and event = 'Foo'
UNION ALL
SELECT MAX(Total_count) from table where date = today-1 and event = 'Foo'
UNION ALL
SELECT MAX(Total_count) from table where date = today-2 and event = 'Foo'
)

Yeah, that looks much easier than>

SELECT COUNT(*) FROM table WHERE DATE BETWEEN today-2 and today and event = 'foo'

And think about the trigger it would take to add a row... get the max for that day and event and add one... every time you insert?

Not sure what kind of server you have but I summed 1 Million rows in 285ms. So... how many millions will you have and how many times do you need to sum them and is each time for the same date range or completely random?

Stephanie Page