Hi,
I'm writing a small Django-based frontend to collect and graph internet usage statistics.
Currently, from our ISP, we get monthly text files that show the average bytes/second for every 5-minute interval. E.g.:
Date Time In Out
28.03.2010 00:00:00 204304 228922
28.03.2010 00:05:00 104231 222998
28.03.2010 00:10:00 264292 210194
28.03.2010 00:15:00 212982 213048
28.03.2010 00:20:00 90543 139082
28.03.2010 00:25:00 71620 175556
28.03.2010 00:30:00 65382 207898
28.03.2010 00:35:00 68676 213925
28.03.2010 00:40:00 62974 204304
28.03.2010 00:45:00 54341 208427
28.03.2010 00:50:00 98822 155641
We multiply these numbers by 300 (5x60) to get the total bytes in/out for each 5-minute block.
(I'm actually curious why the ISP would give us average bytes/sec like that, instead of actually giving us the total bytes consumed in a 5 minute interval? To anybody in the know, is there some kind of technical basis to that?)
It's then fairly trivial to tally these up to get daily or hourly totals, and graph them.
My question is pretty simple - in Django, what would be an efficient model for storing these?
The total bytes in/out doesn't actually belong to a single point-in-time, it covers a period. Is there much point in storing each datapoint as both a start and end time, then storing the total bytes in/out? It feels cleaner doing that, but is it bad to just store a single date/time and make assumptions that it's for the five minute interval preceding/after it (to be honest, I'm actually not even sure which of those two it is).
Or are there more clever/efficient ways of storing this data - the end result is we'd want to do things like graph the totals per hour or per day (or any arbitrary period), and also graph the actual flow rates etc.
I'm trying to find an efficient way of storing the data, that's also easy to query for the above statistics.
Also, any particular good visualisations/stats we could use here?
Cheers, Victor