ansaurus

Question

Basic site analytics doesn't tally with Google data

Answer 1

+1 A:

Lots of people block Google Analytics for privacy reasons.

Martin Smith 2010-03-23 14:03:33

Interesting! I doubt this is a large proportion of our traffic though. It's definitely not a technical community.

Jenkz 2010-03-23 16:35:53

It takes about 2 seconds to install AdBlock in Firefox, technical community not required. This blocks Google Analytics by default.

mxmissile 2010-03-24 04:02:07

Most of my users haven't heard of Firefox, have no idea what "installing" something is and certainly wouldn't have a clue what AdBlock would do or how to get it. It's 90% Internet explorer straight out of the box.But I take your point :)

Jenkz 2010-03-24 10:25:01

Answer 2

A:

Under-reporting by the client-side rig versus server-side eems to be the usual outcome of these comparisons.

Here's how i've tried to reconcile the disparity when i've come across these studies:

Data Sources recorded in server-side collection but not client-side:

hits from mobile devices that don't support javascript (this is probably a significant source of disparity between the two collection techniques--e.g., Jan 07 comScore study showed that 19% of UK Internet Users access the Internet from a mobile device)
hits from spiders, bots (which you mentioned already)

Data Sources/Events that server-side collection tends to record with greater fidelity (much less false negatives) compared with javascript page tags:

hits from users behind firewalls, particularly corporate firewalls--firewalls block page tag, plus some are configured to reject/delete cookies.
hits from users who have disabled javascript in their browsers--five percent, according to the W3C Data
hits from users who exit the page before it loads. Again, this is a larger source of disparity than you might think. The most frequently-cited study to support this was conducted by Stone Temple Consulting, which showed that the difference in unique visitor traffic between two identical sites configured with the same web analytics system, but which differed only in that the js tracking code was placed at the bottom of the pages in one site, and at the top of the pages in the other--was 4.3%

FWIW, here's the scheme i use to remove/identify spiders, bots, etc.:

monitor requests for our robots.txt file: then of course filter all other requests from same IP address + user agent (not all spiders will request robots.txt of course, but with miniscule error, any request for this resource is probably a bot.
compare user agent and ip addresses against published lists: iab.net and user-agents.org publish the two lists that seem to be the most widely used for this purpose
pattern analysis: nothing sophisticated here; we look at (i) page views as a function of time (i.e., clicking a lot of links with 200 msec on each page is probative); (ii) the path by which the 'user' traverses out Site, is it systematic and complete or nearly so (like following a back-tracking algorithm); and (iii) precisely-timed visits (e.g., 3 am each day).

doug 2010-03-24 02:51:16

Thanks for the detail doug.

Jenkz 2010-03-24 15:33:48

Answer 3

A:

Biggest reasons are users have to have JavaScript enabled and load the entire page as the code is often in the footer. Awstars, other serverside solutions like yours will get everything. Plus, analytics does a real good job identifying bots and scrapers.

mdvaldosta 2010-03-24 03:29:15

ansaurus

tags:

views:

answers:

Basic site analytics doesn't tally with Google data

related questions