tags:

views:

87

answers:

4

I have recently started using Google Webmaster Tools.

I was quite surprised to see just how many links google is trying to index.

http://www.example.com/?c=123
http://www.example.com/?c=82
http://www.example.com/?c=234
http://www.example.com/?c=991

These are all campaigns that exist as links from partner sites.

For right now they're all being denied by my robots file until the site is complete - as is EVERY page on the site.

I'm wondering what is the best approach to deal with links like this is - before I make my robots.txt file less restrictive.

I'm concerned that they will be treated as different URLS and start appearing in google's search results. They all correspond to the same page - give or take. I dont want people finding them as they are and clicking on them.

By best idea so far is to render a page that contains a query string as follows :

 // DO NOT TRY THIS AT HOME. See edit below
 <% if (Request.QueryString != "") { %>

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

 <% } %>

Do I need to do this? Is this the best approach?

Edit: This turns out NOT TO BE A GOOD APPROACH. It turns out that Google is seeing NOINDEX on a page that has the same content as another page that does not have NOINDEX. Apparently it figures they're the same thing and the NOINDEX takes precedence. My site completely disappeared from Google as a result. Caveat: it could have been something else i did at the same time, but i wouldn't risk this approach.

A: 

That seems like the best approach unless the page exists in it's own folder in which case you can modify the robots.txt file just to ignore that folder.

Dave Anderson
do not try my own suggestion! see edit above
Simon_Weaver
+1  A: 

Yes, Google would interprete them as different URLs.

Depending on your webserver you could use a rewrite filter to remove the parameter for search engines, eg url rewrite filter for Tomcat, or mod rewrite for Apache.

Personally I'd just redirect to the same page with the tracking parameter removed.

Pool
i actually did do this originally, but then google analytics can't track the campaign id. i'm going round in circles!
Simon_Weaver
You should redirect for just the crawlers.
Pool
+4  A: 

This is the sort of thing that rel="canonical" was designed for. Google posted a blog article about it.

Samir Talwar
+1, canonical will help. Also check out "site maps", cfr http://en.wikipedia.org/wiki/Site_map .
Alex Martelli
Google states in the article "Is canonical a hint or a directive? It's a hint that we honor strongly." Do the other engines also honor this hint and if so as strongly?
Dave Anderson
thanks. i love going to bed with a question and waking up with an answer with 3 votes already :-)
Simon_Weaver
@Dave: From the bottom of the article, I quote: "Updated: This link-tag is currently also supported by Ask.com, Microsoft Live Search and Yahoo!." I didn't see it there either until I, well, Googled for it just now and it popped up a link to it with the important parts highlighted.
Samir Talwar
A: 

For resources that should not be indexed I prefer to do a simple return in the page load:

if (IsBot(Request.UserAgent)
    return;
mhenrixon