Can OLAP be done in BigTable?

In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).

The queries that you run on a table like this are usually of the form (meta-SQL):

SELECT SUM(hits), SUM(bytes),
FROM MyCube
WHERE date='20090914' and pagename='Homepage' and browser!='googlebot'
GROUP BY hour

So you get the totals for each hour of the selected day with the mentioned filters. One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.

I'm currently learning the ins and outs of Hadoop and the likes.

Running the above query as a mapreduce on a BigTable looks easy enough: Simply make 'hour' the key, filter in the map and reduce by summing the values.

Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?

If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?

Thanks for the zohmg sugegstion. According to their website: "The core idea is to pre-compute aggregates and store them in a read-efficient manner". My idea is to start with a set of data and aggregate based on the users needs at that moment.

Niels Basjes 2009-09-16 11:45:49

You want to preaggregate so that for each unique combination of dimensions you have at most one row; the run-time aggregation is then a question of rolling up the appropriate cross-section of the cube. Zohmg can point the way for you on how to do that. I know of at least one ad network that uses either HyperTable or HBase to do real-time dashboarding for their customers, so it's doable.

SquareCog 2009-09-16 13:54:21

ansaurus

tags:

views:

answers:

Can OLAP be done in BigTable?

related questions