views:

16

answers:

2

I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:

  • the time resolution should be variable from one second to a year
  • there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished

Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?

EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.

A: 

I think you can use mysql timestamps for this.

rogerdpack
That solves the first problem (I can just divide timestamp by granularity and group by that), but not the second, much harder one.I fear I won't manage to get around preprocessing the data.
moeffju
possibly can use mysqltimestamp's like "where time < xxx and time > yyy"
rogerdpack
That would mean having to run one query for each bucket, and aggregating in the application layer. I don't see how that will scale.
moeffju
A: 

The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.

moeffju