views:

99

answers:

3

We've got a bunch of data the users may want to view windows of and do so quickly. They may want to look at a window of the data that is a day, a week, a month, or an arbitrary beginning and ending data. Sorting and summing up all of this stuff in real time is proving to be painful for us so I got the idea of doing something similar to Mipmaps in 3D rendering. You end up storing the same data pre-calculated at a variety of different scales and then interpolate the results using the varying scales. So I would already know what the numbers were for a year, a given month, a given week, and a given day for a store and if they ask for a particular range I use the various scales to quickly add up something that gives the right results but I don't have to necessarily reprocess the full data set, I just retrieve four or five records and add or subtract them.

Is this a real pattern? Does it make any sense and there are places I can read about how to do it best or are there much better ways of dealing with large chunks of data like this where it needs to be viewed in varying slices?

It seems like this should be a well known and solved problem. For example, lots of people have stock portfolios and they need to do this kind of thing every day. Our data isn't stock prices, but the idea is the same.

+1  A: 

In a sense, I think you kind of answered your own question here when you explained how Mip Mapping works (by interpolation/extrapolation).

At different levels of "zoom", you'd just choose a lower resolution or sample-rate of the data. The converse would apply at higher levels of "zoom" - to the point where you would need to use interpolation (like linear/polynomial/spline/etc) on the data to estimate at the values between your data points.

JasonWyatt
I'm just wondering if there is a body of literature for this. Perhaps this is a crappy solution for data and it only works OK for visual stuff (which can be far less forgiving than money, trust me on that).I was kind of hoping someone would say, "Oh yeah, that's just what we do for blah blah blah and it works great," or, "I can see where you might think that naive solution would work but you really should be using a Bumpletag structure and it would solve your problem a lot better."
John Munsch
+1  A: 

I like your analogy to mipmapping and I think the field of Observations and Measurements, especially Sampling regimes is probably where you'll find the abstract data design you're after. It gives you the theory behind the data, although they think more in terms of XML data models than relational tables.

I used to work with the guys at CSIRO behind this and a lot of the thinking comes from having to manage huge data sets for things like water sampling sensors. More detail at the SEEGrid wiki.

Andy Dent
+2  A: 

OK, I searched and searched and searched some more. Andy Dent's links made me start describing the data as "time-series" and that helped some. Then I ran across OLAP and realized that what I'm doing is reinventing that. I knew this had to be a well-known, thoroughly dealt-with problem and I was right. OLAP is it.

You build a bunch of aggregate tables which aggregate the data along particular dimensions (time in this case) and you can even get tools like Mondrian which will take queries written in another query language (i.e. not SQL) and a set of fact tables plus aggregates and it will decide how best to perform the query against those tables.

John Munsch