views:

67

answers:

2

We're doing a whitelabel site, which mustn't be google indexed.

Does anyone know a tool to check if the googlebot will index a given url ?

I've put <meta name="robots" content="noindex" /> on all pages, so it shouldn't be indexed - however I'd rather be 110% certain by testing it.

I know I could use robots.txt, however the problem with robots.txt is as follows: Our mainsite should be indexed, and it's the same application on the IIS (ASP.Net) as the whitelabel site - the only difference is the url.

I cannot modify the robots.txt depending on the incoming url, but I can add a meta tag to all pages from my code-behind.

+4  A: 

You should add a Robots.txt to your site.

However, the only perfect way to prevent search engines from indexing a site is to require authentication. (Some spiders ignore Robots.txt)

EDIT: You need to add an handler for Robots.txt to serve different files depending on the Host header.
You'll need to configure IIS to send the Robots.txt request through ASP.Net; the exact instructions depend on the IIS version.

SLaks
The problem with robots.txt is as follows:Our mainsite should be indexed, and it's the same application on the IIS (ASP.Net) as the whitelabel site - the only difference is the url.I cannot modify the robots.txt depending on the incoming url, but I can add a meta tag to all pages from my code-behind.
Steffen
You can make a dynamic Robots.txt using ASP.Net.
SLaks
You could also use mod_rewrite (or something similar on IIS) to serve a different robots.txt depending on the current HTTP_HOST. But as said before, robots.txt is not safe anyway because some spiders ignore it.
Alex
True I could setup the IIS to send robots.txt through ASP.Net - however with quite a few servers in the cluster, and changing those servers every so often - this is going to be a maintenance nightmare.I'll stick to the meta tag :-)It's not to conceal sensitive information, it's just to avoid duplicate content.
Steffen
+2  A: 

Google Webmasters Tools (google.com/webmasters/tools) will (other than permitting you to upload a sitemap) do a test crawl of your site and tell you what they crawled, how it rates for certain queries, and what they will crawl and what not.

The test crawl isn't automatically included in google results, anyway if its sensitive data you cannot count on that alone: put some authentication on the line of fire, no matter what.

ZJR
Thanks, that was exactly what I needed :-)Like I mentioned above I'm not trying to conceal senstive data, I just want to avoid Duplicate Content.
Steffen
Yep, white label sites. An interesting concept.
ZJR
Indeed - and it's our first one too :-)
Steffen