views:

45

answers:

3

Can anyone tell me what's wrong with this robots.txt?

http://bizup.cloudapp.net/robots.txt

The following is the error I get in Google Webmaster Tools:

Sitemap errors and warnings
Line    Status  Details
Errors  -   
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but were unable to download it. Please ensure that it is accessible or remove
it completely.

Actually the link above is the mapping of a route that goes an action Robots. That action gets the file from the storage and returns the content as text/plain. Google says that they can't download the file. Is it because of that?

+1  A: 

There is something wrong with the script that is generating the robots.txt file. When GoogleBot is accessing the file it is getting 500 Internal Server Error. Here are the results of the header check:

REQUESTING: http://bizup.cloudapp.net/robots.txt
GET /robots.txt HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: bizup.cloudapp.net
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 500 INTERNAL SERVER ERROR
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/7.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 19 Aug 2010 16:52:09 GMT
Content-Length: 4228
Final Destination Page

You can test the headers here http://www.seoconsultants.com/tools/headers/#Report

Shaji
Yes, something was wrong. Stevemagson helped me with it. Thanks!
Fabio Milheiro
+1  A: 

I have no problem to get your robots.txt

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt

However isn't it performing a recursive robots.txt call?

A Sitemap is supposed to be a xml file, see Wikipedia

ring0
Yes, I already knew that, but it was a dummy error on my part. Thanks! 1+
Fabio Milheiro
+2  A: 

It looks like it's reading robots.txt OK, but your robots.txt then claims that http://bizup.cloudapp.net/robots.txt is also the URL of your XML sitemap, when it's really http://bizup.cloudapp.net/sitemap.xml. The error seems to come from Google trying to parse robots.txt as an XML sitemap. You need to change your robots.txt to

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml

EDIT

It actually goes a bit deeper than that, and Googlebot can't download any pages at all on your site. Here's the exception being returned when Googlebot requests either robots.txt or the homepage:

Cookieless Forms Authentication is not supported for this application.

Exception Details: System.Web.HttpException: Cookieless Forms Authentication is not supported for this application.

[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:\Projectos\AzureBrightWebRole\Global.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266

FormsAuthentication is trying to use cookieless mode because it recognises that Googlebot doesn't support cookies, but something in your FormsAuthentication_OnAuthenticate method is then throwing an exception because it doesn't want to accept cookieless authentication.

I think that the easiest way around that is to change the following in web.config, which stops FormsAuthentication from ever trying to use cookieless mode...

<authentication mode="Forms"> 
    <forms cookieless="UseCookies" ...>
    ...
stevemegson
@stevemegson, now that looks like an answer! I makes all sense and I check it out now... +1
Fabio Milheiro
@stevemegson, how do did manage to see the exception? I have been trying some so called Googlebot simulators, but that exception is not happening.
Fabio Milheiro
Some Googlebot simulators use the headers from an old version of Googlebot, and for some reason only the latest version causes this problem. Google's Webmaster Tools has a 'fetch as Googlebot' function in Labs which you can assume always matches the real Googlebot.Once you know the right headers to send, Fiddler allows you to hand-craft a HTTP request and inspect the response, so I copied the request headers from Shaji's answer to see what came back. (http://www.fiddler2.com/)
stevemegson
Well it appears that it works now. The sitemap has been submitted in webmaster tools and in my google custom search engine on-demand. Thanks a lot! I believe the cookieless thing saved me hours of research!
Fabio Milheiro