views:

1297

answers:

5

Now, I realise the initial response to this is likely to be "you can't" or "use analytics", but I'll continue in the hope that someone has more insight than that.

Google adwords with "autotagging" appends a "gclid" (presumably "google click id") to link that sends you to the advertised site. It appears in the web log since it's a query parameter, and it's used by analytics to tie that visit to the ad/campaign.

What I would like to do is to extract any useful information from the gclid in order to do our own analysis on our traffic. The reasons for this are:

  • Stats are imperfect, but if we are collating them, we know exactly what assumptions we have made, and how they were calculated.
  • We can tie the data to the rest of our data and produce far more accurate stats wrt conversion rate.
  • We don't have to rely on javascript for conversions.

Now it is clear that the gclid is base64 encoded (or some close variant), and some parts of it vary more than others. Beyond that, I haven't been able to determine what any of it relates to.

Does anybody have any insight into how I might approach decoding this, or has anybody already related gclids back to compaigns or even accounts?

I have spoken to a couple of people at google, and despite their "don't be evil" motto, they were completely unwilling to discuss the possibility of divulging this information, even under an NDA. It seems they like the monopoly they have over our web stats.

+1  A: 

Well, this is no answer, but the approach is similar to how you'd tackle any cryptography problem.

Possibility 1: They're just random, in which case, you're screwed. This is analogous to a one-time pad.

Possibility 2: They "mean" something. In that case, you have to control the environment.

  1. Get a good database of them. Find gclids for your site, and others. Record all times that all clicks occur, and any other potentially useful data
  2. Get cracking! As you have started already, start regressing your collected data against your known, and see if you can find patterns used decrypting techniques
  3. Start scraping random gclid's, and see where they take you.

I wouldn't hold high hope for this to be successful though, but I do wish you luck!

Gregg Lind
Re 1 - I'm pretty convinced they're not random. Our gclids are similar, other people's are similar too, but dissimilar to ours. They're definitely not a simple incrementing id.2.1 - This is *hard* since there are a lot of gclids you don't see (if they don't click on them)....
Draemon
... I have collected a large list from the logs, and I have identified which bytes change more than others, and my brain shouts "this isn't random" but beyond that nothing has lead anywhere.2.2/2.3 - I'd love a link to any techniques or tools - instinct hasn't got me very far.
Draemon
tbh I don't hold out much hope either, but it would be very cool - and I really don't think this is something google should have a monopoly over. I just have a niggling feeling it's "easy if you know how"
Draemon
http://blog.merjis.com/2007/07/16/click-fraud-google-adwords-and-gclid/ seems to have a lot of discussion about the role of the gclid and googling seems to yield a lot of basic insight. They're new to me, to I'll poke in if I learn more.
Gregg Lind
Since Google Analytics can understand the gclid, it's likely to be a two-way hash, which is a plus. Work on referrer_id.
Gregg Lind
+3  A: 

By far the easiest solution is to manually tag your links with Google Analytics campaign tracking parameters (utm_source, utm_campaign, utm_medium, etc.) and then pull out that data.

The gclid is dependent on more than just the adwords account/campaign/etc. If you click on the same adwords ad twice, it could give you different gclids, because there's all sorts of session and cost data associated with that particular click as well.

Gclid is probably not 100% random, true, but I'd be very surprised and concerned if it were possible to extract all your Adwords data from that number. That would be a HUGE security flaw (i.e. an arbitrary user could view your Adwords data). More likely, a pseudo-random gclid is generated with every impression, and if that ad is clicked on, the gclid is logged in Adwords (otherwise it's thrown out). Analytics then uses that number to reconcile the data with Adwords after the fact. Other than that, there's no intrinsic value in the gclid number itself.

In regards to your last point, attempting to crack or reverse-engineer this information is explicitly forbidden in both the Google Analytics and Google Adwords Terms of Service, and is grounds for a permanent ban. Additionally, the TOS that you agreed to when signing up for these services says that it is not your data to use in any way you feel like. Google is providing a free service, so there are strings attached. If you don't like not having complete control over your data, then there are plenty of other solutions out there. However, you will pay a premium for that kind of control.

Google makes nearly all their money from selling ads. Adwords is their biggest money-making product. They're not going to give you confidential information about how it works. They don't know who you are, or what you're going to do with that information. It doesn't matter if you sign an NDA and they have legal recourse to sue you; if you give away that information to a competitor, your life isn't worth enough to pay back the money you will have lost them.

Sorry to break it to you, but "Don't be Evil" or not, Google is a business, not a charity. They didn't become one of the most successful companies in the world by giving away their search algorithm to the first guy who asked for it.

Chris
+2  A: 

FYI, I just posted a quick analysis of some glcid data from my sites on this post. There definitely is some structure to the gclid, but it is difficult to decipher.

Thanks for the info - nice to know someone else is curious! You really need to decode the characters before looking for patterns, as base64 will spread source bytes over adjacent encoded bytes. I've done similar analysis my self and similarly have convinced myself there's some sort of pattern, but no idea what.
Draemon
The character analysis is very interesting, and essentially proves that there is data encoded in these gclids... very cool.
ojrac
A: 

A different approach is to simply look at the referrer data which will at least provide the keyword which was searched.

Ophir Prusak
referrer data is sent by the client (user agent) and is unreliable.
Draemon
A: 

I agree with Ophir and Chris. My feeling is that it is purely a serial number / unique click ID, which only opens up its secrets when the Analytics and Adwords systems talk to each other behind the scenes.

Knowing this, I'd recommend looking at the referring URL and pulling as much as possible from this to use in your back end click tracking setup.

For example, I live in NZ, and am using Firefox. This is a search from the Firefox Google toolbar for "stack overflow": http://www.google.co.nz/search?q=stack+overflow&ie=utf-8&oe=utf-8&aq=t&client=firefox-a&rlz=1R1GGLL_en-GB

You can see that: a) im using .NZ domain, b) my keyword "stack+overflow", c) im running firefox.

Finally, if you also stash the full landing page URL, you can store the GCLID, which will tell you the visitor came from paid, whereas if it doesn't have a GCLID, then the user must have come from natural search (if URL tagging is enabled of course).

This would theoretically allow you to then search for the keyword in your campaign, and figure out which adgroup them came from. Knowing the creative would probably be impossible though, unless you split test your landing URLs or tag them somehow.

Tom.iMallBrands
agreed that if the gclid was a genuinely opaque reference, that's the end of it. I'm pretty convinced it has some structure, however. What little information I had from Google seemed to imply this.
Draemon