tags:

views:

144

answers:

5

I have a bunch of statistics: # of users, # of families, ratio user/family, etc. I'd like to store these daily so I can view this data historically.

However, I'm looking for the most effective way to store this data.

Should I run a cron job that writes to the database DATE: today USERS: 123 FAMILIES: 456 RATIO: 7.89 or whatever? (or should I write multiple rows like DATE: today DATATYPE: users VALUE: 123?)

Or is there another option I can use that is more efficient or more effective?

Thanks!

A: 

If you store the same set of statistics every day, store them as separate columns seems like the best bet. I believe it's marginally more efficient than writing each statistic to its own individual row. The only reason I could see to recommend the latter approach is if some days you only store some of the statistics, but even then you could just put null values in the database for those days.

Incidentally, you could probably just append lines to a text file, if you don't need to use a full-blown database system.

David Zaslavsky
A: 

Write a single stored procedure to "update" stats with nullable parameters for each count. Inside the procedure decide whether to update an existing row by adding in the new numbers, or to insert a new row with the numbers given. The update/insert decision is simply whether a row exists for the year & day in question. This approach hides all the complexity from the app, and gives you running totals for the day as well.

dwc
A: 

Most text processing utilities like grep, sed, etc. work much better with multiple fields on a single line. Also makes reading the data back into a program easier, so I would go with that.

alxp
A: 

As far as the two database solutions you suggested, it probably doesn't matter. I would personally pick the first one, just to make queries slightly easier.

If you're really not collecting that much data (like, only a few megabytes), a flat text file might be the way to go, if only because it would be less of a hassle to manage (e.g. easier to backup, more portable, etc.). If it becomes big and unmanageable after a while as a text file, it would be trivial to write a script to put it into a database.

Chris
A: 

I think RRDtool (home page) might be usable for what you describe. It is probably tailored to more regular sampling of data than I get the impression that you need, but that should not be a problem.

Rrdtool is a round robin database that operates on a (configurable) fixed time period, wrapping around and overwriting the oldest entries. If you rather want to save everything (like I typically do) you can just use a period of several decades. Even for sampling the sent/received number of bytes from a network card every 5th minute, the database is just 65Mb for 20 years.

You can use rrdtool by itself, but the real power of it is to use it in tandem with MRTG (home page) which gives you dayly/weekly/monthly/yearly graphs of the activity you are monitoring. While MRTG primarily is written to capture/present network traffic you can monitor other more or less anything else as well things like for instance CPU/disk temperature (which is on my todo list of setting up).

There are also alternative graphing solutions not involving mrtg; if you search for "rrdtool disk temperature" you will find several.

hlovdal