Say I have a site on http://website.com. I would really like allowing bots to see the home page, but any other page need to blocked as it is pointless to spider. In other words
http://website.com & http://website.com/ should be allowed, but
http://website.com/anything and http://website.com/someendpoint.ashx should be blocked.
Further...
I have a site with the following robots.txt in the root:
User-agent: *
Disabled: /
User-agent: Googlebot
Disabled: /
User-agent: Googlebot-Image
Disallow: /
And pages within this site are getting scanned by Googlebots all day long. Is there something wrong with my file or with Google?
...
Our SEO team would like to open up our main dynamic search results page to spiders and remove the 'nofollow' from the meta tags. It is currently accessible to spiders via allowing the path in robots.txt, but with a 'nofollow' clause in the meta tag which prevents spiders from going beyond the first page.
<meta name="robots" content="in...
Hi,
Just wanted to know if it is possible to disallow the whole site for crawlers and allow only specific webpages or sections?
Is "allow" supported by crawlers like FAST and Ultraseek?
Kind regards,
...
For some reason when I check on Google Webmaster Tool's "Analyze robots.txt" to see which urls are blocked by our robots.txt file, it's not what I'm expecting. Here is a snippet from the beginning of our file:
Sitemap: http://[omitted]/sitemap_index.xml
User-agent: Mediapartners-Google
Disallow: /scripts
User-agent: *
Disallow: /scrip...
I've been thinking a while about disallowing every crawler except Ask, Google, Microsoft, and Yahoo! from my site.
The reasoning behind this is that I've never seen any traffic being generated by any of the other web-crawlers out there.
My questions are:
Is there any reason not to?
Has anybody done this?
Did you notice any negative e...
I had happen in the past that one of our IT Specialist will move the robots.txt from staging from production accidentally. Blocking google and others from indexing our customers' site in production. Is there a good way of managing this situation?
Thanks in advance.
...
Short question:
Has anybody got any C# code to parse robots.txt and then evaluate URLS against it so see if they would be excluded or not.
Long question:
I have been creating a sitemap for a new site yet to be released to google. The sitemap has two modes, a user mode (like a traditional sitemap) and an 'admin' mode.
The admin mode ...
I have links with this structure:
http://www.example.com/tags/blah
http://www.example.com/tags/blubb
http://www.example.com/tags/blah/blubb (for all items that match BOTH tags)
I want google & co to spider all links that have ONE tag in the URL, but NOT the URLs that have two or more tags.
Currently I use the html meta tag "robots" ...
Hi,
in my robots.txt file, I have the following line
User-agent: Googlebot-Mobile
Disallow: /
User-agent:GoogleBot
Disallow: /
Sitemap: http://mydomain.com/sitemapindex.xml
I know that if I put the first 4 lines , googlebot won't index the sites, but what if I put the last line Sitemap: http://mydomain.com/sitemapindex.xml, will go...
What would the syntax be to block all access to any bots to https:// pages? I have an old site that now doesn't have an SSL and I want to block access to all https:// pages
...
Help Help! Google indexed a test folder on my website which no one save I was supposed to know about :(! How do I restrict google from indexing links and certain folders.
...
I have a serious question. I'm not trying to start a flamewar or to incite any violence--but here goes.
Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind:
1.) If someone puts a web site up they're expecting some visits. Granted, web crawlers are using...
I am currently generating a sitemap file dynamically using a HttpHandler, with a path set to sitemap.axd. This then returns xml content. No one at my office is certain if all search engines accept this extension or if they need .xml to parse. I know that I can submit it to Google through the webmaster tools and use robots.txt to indicate...
Using robot.txt is it possible to restrict robot access for (specific) query string (parameter) values?
ie
http://www.url.com/default.aspx #allow
http://www.url.com/default.aspx?id=6 #allow
http://www.url.com/default.aspx?id=7 #disallow
...
I've got a bunch of ajaxified links that do things like vote up, vote down, flag a post - standard community moderation stuff.
Problem is that the googlebot crawls those links, and votes up, votes down, and flags items.
Will adding this to robots.txt prevent the googlebot from crawling those links? Or is there something else I need to...
I have found a post http://stackoverflow.com/questions/999056/ethics-of-robots-txt/999088#999088 discussing a matter of robots.txt on web sites. Generally, I agree with the principals. However, there are commercial tools checking Google positions by - very likely - scraping Google for results, due to lack of API (in case someone doesn't ...
Hi,
We have a sitemap at our site, http://www.gamezebo.com/sitemap.xml
Some of the urls in the sitemap, are being reported in the webmaster central as being blocked by our robots.txt, see, gamezebo.com/robots.txt ! Although these urls are not Disallowed in Robots.txt. There are other such urls aswell, for example, gamezebo.com/gamelin...
how to disallow all dynamic urls in robots.txt
Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/
i want to disallow all things that st...
I need to disallow indexing 2 pages, one of them dynamic:
site.com/news.php
site.com/news.php?id=__
site.com/news-all.php
What should I write in robots.txt:
User-agent: *
Disallow: /news
or
Disallow: /news*
or
Disallow: /news.php*
Disallow: /news-all.php
Should one use wildcard in the end or not?
...