In only one search engine I want to get... http://mysite.com/ indexed,
not...
http://mysite.com/index.php I only want to allow indexing of the
main page of the website, and nothing more. I do not want the bot to
follow any of the links on the main page.
My meta tags include the following::
<meta name="robots" content="index, nofollo...
Hi all, we have moved our website to new domain & want all pages of old site to get removed from search engines. Its the same site, same content, just a new domain, so search-engines are taking time because of duplicate content (maybe). We have added .htaccess 301 from our old site to new site as: redirect 301 / http://new-domain.com/
N...
Will this robots.txt file only allow googlebot to index my site's index.php file? CAVEAT, I have an htaccess redirect that people who type in
http://www.example.com/index.php
are redirected to simply
http://www.example.com/
So, this is my robots.txt file content...
User-agent: Googlebot
Allow: /index.php
Disallow: /
User-agent: ...
Using a redirect statement in my htaccess file, people who type the following into the address bar...
http://example.com/index.php
...are redirected to...
http://example.com/
I also have a noindex, nofollow meta tag on all my website's pages.
My question is, given that redirect behavior and meta data, will googlebot index my mainp...
I am about to create a robots.txt file.
I am using notepad.
How should I save the file? UTF8, ANSI or what?
Also, should it be a capital R?
And in the file, I am specifying a sitemap location. Should this be with a capital S?
User-agent: *
Sitemap: http://www.domain.se/sitemap.xml
Thanks
...
I have a client whose domain seems to be getting hit pretty hard by what appears to be a DDoS. In the logs it's normal looking user agents with random IPs but they're flipping through pages too fast to be human. They also don't appear to be requesting any images. I can't seem to find any pattern and my suspicion is it's a fleet of Window...
I need guideline about using of robots.txt problem is as following.
I have one live website "www.faisal.com" or "faisal.com" and have two testing web servers as follows
"faisal.jupiter.com" and "faisal.dev.com"
I want one robots.txt to handle this all, i don't want crawlers to index pages from "faisal.jupiter.com" and "faisal.dev.com"...
I am using spring frameworking following is the mapping of url to controller
<bean id="urlMapping" class="org.springframework.web.servlet.handler.SimpleUrlHandlerMapping">
<property name="mappings">
<props>
<prop key="/controller.web">webController</prop>
<prop key="/robots.txt">robotsController</prop>
</props>
</property>
</bean>
Whe...
In the robots.txt file, I am about to disallow some sections of my site.
For instance, I don't want my "terms and conditions" to be indexed by search engines.
User-agent: *
Disallow: /terms
The real path to the file is actually
/data/terms_and_conditions.html
But I have used .htaccess to rewrite the URL.
Now to my Q, should I ...
I need to have control over what URLs are allowed to be indexed. To do this I want to allow google to index only URLs that are listed in my Sitemap(s), and disallow Google from indexing anything else.
Easiest way to solve this is if there is a way to configure robots.txt to disallow everything:
User-agent: *
Disallow: /
And ...
Hello All,
I have www.domainname.com, origin.domainname.com pointing to the same codebase. Is there a way, I can prevent all urls of basename origin.domainname.com from getting indexed.
Is there some rule in robot.txt to do it. Both the urls are pointing to the same folder.
Also, I tried redirecting origin.domainname.com to www.domainn...
In this webpage:
http://www.alvolante.it/news/pompe_benzina_%E2%80%9Ctruccate%E2%80%9D_autostrada-308391044
there is this image:
http://immagini.alvolante.it/sites/default/files/imagecache/anteprima_100/images/rifornimento_benzina.jpg
Why this image is indexed if in the robots.txt there is "Disallow: /sites/" ??
You can see that is ...
Possible Duplicate:
Why google index this ?
In this webpage:
http://www.alvolante.it/news/pompe_benzina_%E2%80%9Ctruccate%E2%80%9D_autostrada-308391044
there is this image:
http://immagini.alvolante.it/sites/default/files/imagecache/anteprima_100/images/rifornimento_benzina.jpg
Why this image is indexed if in the robots.tx...
Hi,
I'm running a site which allows users to create subdomains. I'd like to submit these user subdomains to search engines via sitemaps. However, according to the sitemaps protocol (and Google Webmaster Tools), a single sitemap can include URLs from a single host only.
What is the best approach?
At the moment I've the following stru...
Any ideas how I can block Alexa Toolbar users? I don't want to appear in the rankings while we are in beta ...
I see you can block their search engine with
User-agent: ia_archiver
Disallow: /
but I can't find any documentation on how to pull your self from actually being ranked..
I read earlier someone tried to email them and they r...
Hi,
I have a site with some restricted content. I want my site to appear in search results, but I do not want it to get public.
Is there a way by which I can allow crawlers to crawl through my site but prevent them from making it public?
The closest solution I have found is Google First Click Free but even it requires me to show the c...
I would like to deny web robots to access a url like this:
http://www.example.com/export
allowing this kind of url instead:
http://www.example.com/export?foo=value1
A spider bot is calling /export without query string causing a lot of errors on my log.
Is there a way to manage this filter on robots.txt?
...
Hello,
Here is the copy of my robots.txt content
Sitemap: http://www.go4film.com/sitemap.xml
User-agent: ia_archiver
Disallow: /
User-agent: robtexbot
Disallow: /
User-agent: Googlebot
Allow: /
Here I only allow googlebot.And block Alexa and everything
So could someone please tell me if I block search engines like yahoo,ask,bing ...
Facebook's developer principles and policies and the general terms of use seem to forbid automated data collection, but graph.facebook.com/robots.txt seems to allow it:
User-agent: *
Disallow:
Does anybody know how to make sense of this?
...