views:

97

answers:

4

Just when I get quite familiar with SQL statements once again, when pulling data from Google Analytics, I found that they don't use SQL, but rather, use Dimensions and Metrics and the combinations of them.

Why is a reason for that? I think it doesn't have a SQL interface (or a plain web server log download)? If so, how do SQL statements translate to Dimension, Metrics (and Segment and Filters)?

It seems that Metrics tend to be the "aggregates" such as count() or average(), and Dimension tends to be the logged values themselves (such as Browser == IE or Country == Australia), which is the same as the group by values. Filters is like conditionals, and what about Segment?

It seems that if we specify Dimensions, then it automatically does a group by and display that field as well. It does count() or sum() usually. What if we want average(*) instead? And what if we want it to show but don't want it to do a group by?

example website to experiment is at http://code.google.com/apis/analytics/docs/gdata/gdataExplorer.html

+1  A: 

I would imagine the answer lies in the fact that before the API was available the only way you could analyze data was through the Google Analytics interface. And it is there they extensively use "dimension" and "metric". Because non-technical people frequented it, they would have never introduced complex SQL constructs; just easier having dropdowns.

I am not totally sure the way Google Analytics data is stored is SQL friendly (i.e. columns and rows from tables). I have read they have developed their own internal way of storing this data.

M Schenkel
yes, maybe they sample the data instead of storing all the details in some cases
動靜能量
+1  A: 

use of the terms "Dimensions" and "Metrics" suggests that Google are using an OLAP database rather than a relational database.... SQL is used for relational databases: OLAP uses MDX or proprietary query languages (if Oracle)

Mark Baker
+1  A: 

It was probably developed internally using their own technologies like Big Table and Map-Reduce. Mapping and aggregating are the strengths of Map-Reduce type algorithms so it makes sense that the data is going to appear to be aggregated across various dimensions like that.

If you want to know more about them I'd suggest these Wikipedia articles:

John Munsch
+1  A: 

My guess is if you are asking a question like this, you probably having gone much past looking at some of the out-of-the-box reports such as simple page views. If that's all you are doing then you're vastly missing the point and power of Web Analytics. Web analytics in general (not just GA) is about looking at trends in data, over time. And the data itself is acquired by following certain rules and behaviors, both pre-defined and user-defined.

Much of the data for reports cannot be easily grabbed from a direct database query, because the data is based on abstracts such as "xyz over time" and aggregated data. For instance, the concept of "scope" for dimensions and metrics, where a variable and/or value will report data about single page view / events, or over the course of a visit (session) or even over a user defined amount of time (like "make this last a month" or "make this last until some event occurs," like a specific variable or variable type being popped).

Because most of reporting involves higher level concepts of data retrieval, the database is abstracted away, and a "framework" is put in place (the report interface) to help you build reports showing the trended data. Even if you are a database expert, it would take way too much time and effort to try and extract the data manually for virtually everything except the most basic data like page views. And basic data like that is not very actionable.

Look at campaign tracking as an example. It all starts with a single var=value. When a user clicks on a link and goes to a page with that var=value in the url, the tracking code grabs that value and starts attributing not only the data about the page (the url, time, type of browser, list goes on and on) but also all the other data collected from custom coding. Then there are other settings you can apply to it, like attaching a cost-per-click or some weighted measure, attributing success towards a goal or event, etc...based on other rules (first vs. last click attribution, etc...). The list of stuff coming into play and what is considered goes on and on and on. Go ahead and try to make those database query strings yourself. Now wash, rinse and repeat because that was just one campaign code. I've had clients with thousands of campaign codes, with many more being added every day. Oh, and also on top of that, tweaking or making altogether new queries based on how you want the actual report to show the data. Cross-referencing and breaking down by xyz. Looking at funnels and scenarios based on that data. And that's just for campaigns, one thing out of many things.

So to make a long story short, think of a report interface as a framework for databases, with predefined queries you can tweak, to make people's reporting efforts significantly easier, especially since most people aren't database experts.

Crayon Violent