After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I've been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let's me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
- Page Views - Grouped by subsite_id and date
- Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
- The usual "most visited page", "likely time to visit" etc etc.
I've now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I've started discounting IP's from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
- Is it worth me collating a list of all major crawlers to discount, is any list likely to change regularly?
- Are there any other obvious filters that Google will be applying to GA data?
- What other data would you collect that might be of use further down the line?
- What variables does Google use to work out entrance search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.