I am tracking clicks over three time periods: the past day, past week and past month.
To do this, I have three tables:
An hourly table, with columns link_id, two other attributes, and hour_1 to hour_24, together with a computed column giving the sum
A weekday table, with columns click_id, two other attributes, and day_1 to day_7, together with a computed column giving the sum
A monthday table, as above, with columns day_1 to day_31
When a click comes in, I store its key attributes like href, description, etc, in other tables, and insert or update the row(s) corresponding to the link_id in each of the above tables.
Each link can have several entries in each of the above hourly/weekday/monthday tables, depending on the two other attributes (e.g. where the user is sitting).
So if a user is Type A and sitting in X, three rows are created or added to in the above tables -- the first row records all clicks on that link over the time period, the second row records all clicks by "Type A people", and the third "All clicks by people in X".
I have designed it this way as I didn't want to have to move data around each hour/day/week/month. I just maintain pointers for "current hour" (1-24), "current day" (1-31) and "current weekday" (1-7), and write to the corresponding cells in the tables. When we enter a new period (e.g. "3pm-4pm"), I can just blank out that current column (e.g. hour_15), then start incrementing it for links as they come in. Every so often I can delete old rows which have fallen down to "all zero".
This way I shouldn't ever have to move around column data, which would likely be very expensive for what will potentially be tens of thousands of rows.
I will only be SELECTing either the current day/weekday/hour rows (prior to inserting/updating) or the TOP 20 values from the computed columns based on the attributes (and will likely cache these results for an hour or so).
After the tables populate, UPDATES will far exceed INSERTs as there aren't that many unique hrefs.
Three questions:
Is it OK to combine the three big tables into one big table of monthdays/weekdays/hours? This would give a table with 64 columns, which I'm not sure is overkill. On the other hand, keeping them separate like they are now triples the number of INSERT/UPDATE statements needed. I don't know enough about SQL server to know which is best.
Is this approach sensible? Most data sets I've worked with of course have a separate row per item and you would then sort by date -- but when tracking clicks from thousands of users this would give me many hundreds of thousands of rows, which I would have to cull very often, and ordering and summing them would be hideous. Once the tracker is proven, I have plans to roll the click listener out over hundreds of pages, so it needs to scale.
In terms of design, clearly there is some redundancy in having both weekdays and monthdays. However, this was the only way I could think of to maintain a pointer to a column and quickly update it, and use a computed column. If I eliminated the weekdays table, I would need to get an additional computed column on the "monthdays" that summed the previous 7 days -- (e.g. if today is the 21st, then sum day_14, day_15, day_16... day_20). The calculation would have to update every day, which I imagine would be expensive. Hence the additional "weekday" table for a simple static calculation. I value simple and fast calculations more highly than small data storage.
Thanks in advance!