If you're doing min/max/avg queries, do you prefer to use aggregation tables or simply query across a range of rows in the raw table?
This is obviously a very open-ended question and there's no one right answer, so I'm just looking for people's general suggestions. Assume that the raw data table consists of a timestamp, a numeric foreign key (say a user id), and a decimal value (say a purchase amount). Furthermore, assume that there are millions of rows in the table.
I have done both and am torn. On one hand aggregation tables have given me significantly faster queries but at the cost of a proliferation of additional tables. Displaying the current values for an aggregated range either requires dropping entirely back to the raw data table or combining more fine grained aggregations. I have found that keeping track in the application code of which aggregation table to query when is more work that you'd think and that schema changes will be required, as the original aggregation ranges will invariably not be enough ("But I wanted to see our sales over the last 3 pay periods!").
On the other hand, querying from the raw data can be punishingly slow but lets me be very flexible about the data ranges. When the range bounds change, I simply change a query rather than having to rebuild aggregation tables. Likewise the application code requires fewer updates. I suspect that if I was smarter about my indexing (i.e. always having good covering indexes), I would be able to reduce the penalty of selecting from the raw data but that's by no means a panacea.
Is there anyway I can have the best of both worlds?