I need to store large amounts of metering data in a database. A record consists of an id that identifies the data's source, a timestamp and a value. The records are later retrieved via the id and their timestamp.
According to my previous experience (I am developing the successor of an application that's been in productive use over the last five years), disk i/o is the relevant performance bottleneck for data retrieval. (See also this other question of mine).
As I am never looking for single rows but always for (possibly large) groups of rows that match a range of ids and timestamps, a pretty obvious optimization seems to be to store larger, compressed chunks of data that are accessed by a much smaller index (e. g. by a day number) and is decompressed and filtered on the fly by the application.
What I'm looking for is the best strategy for deciding what portion of the data to put in one chunk. In a perfect world, each user request would be fulfilled by retrieving one chunk of data and using most or all of it. So I want to minimize the amount of chunks I have to load for each request and I want to minimize excess data per chunk.
I'll post an answer below containing my ideas so far, and make it community property so you can expand on it. Of course, if you have a different approach, post your own.
ETA: S. Lott has posted this answer below, which is helpful to the discussion even if I can't use it directly (see my comments). The point here is that the "dimensions" to my "facts" are (and should be) influenced by the end user and change over time. This is a core feature of the app and actually the reason I wound up with this question in the first place.