I have been searching around using Google but I can't find an answer to this question.
A robots.txt file can contain the following line:
Sitemap: http://www.mysite.com/sitemapindex.xml
but is it possible to specify MULTIPLE sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps r...
Is there any robots.txt parser written in JavaScript? I'm usually coding with python and there's robotparser: http://docs.python.org/library/robotparser.html, which is very easy to use.
...
I not expert on robots.txt and ı have the following in one of my clients robots.txt
User-agent: *
Disallow:
Disallow: /backup/
Disallow: /stylesheets/
Disallow: /admin/
I am not sure about the second line. Is this line disallows all spiders?
...
Hi,
I have a site where users can enter their profile and password-protect certain details. I would like search engines to crawl the 'unprotected' parts of the profile (which varies from user to user). Similar to how if you enter a user's name in facebook, their Facebook profile comes up in the search results. Do I have to do anything ...
Ok, I understand the Title didn't make any sense so here I've tried to explain it in detail.
I'm using a hosting that gives me space for my domain and lets me "add on" other domains on it. So lets say I have a domain A, and I add on a domain B. Basically my hosting gives me a public_html where I can put stuff that shows when someone vis...
We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we're beginning to suspect that search engine crawlers etc. are getting through. The urls used look like this:
http://www.thesite.com/path/to/the/page/...
Or even forbid indexing the whole site?
UPDATE
Is the space after : mandatory in robots.txt?
...
In VS2008 when I do a find in files across entire website which includes a robots.txt file, I notice that the search pauses for 20-30 secs on robots.txt.
Any ideas how to resolve this issue
...
What does robots.txt file do in PHP project?
...
I've the following robots.txt
User-agent: *
Disallow: /images/
Sitemap: http://www.example.com/sitemap.xml
and the following robotparser
def init_robot_parser(URL):
robot_parser = robotparser.RobotFileParser()
robot_parser.set_url(urlparse.urljoin(URL, "robots.txt"))
robot_parser.read()
return robot_parser
But when...
Where to put robots.txt?
domainname.com/robots.txt
ot
domainname/public_html/robots.txt
I placed here
domainname.com/robots.txt
but it's not opening when i type this in browser
...
I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all.
Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3
I created my own CMS, and this is simply a break...
Hi,
I am using asp.net with C#.
To increase the searchibility of my site in google, I have searched & found out that I can do it by using my robots.txt , but I really don't have any idea how to create it and where can I place my tag like 'asp.net, C#' in my txt file.
Also, the necessary steps to to include it in my application.
Plea...
Hi there!
I have this drupal website that revolves around a document database. By design you can only find these documents by searching the site. But I want all the results to be indexed by Googlebot and other crawlers, so I was thinking, what if I make a page that lists all the documents, and then tell the robots to visit the page to i...
hi
i want to know how to parse the robots.txt in java.
already any code is there?
thanks in advance
...
I've been told to understand how to maximize the visibility of an upcoming web application that is initially available in multiple languages, specifically French and English.
I am interested in understanding how the robots, like the google bot, scrapes a site that is available in multiple language.
I have a few questions concerning the...
What I am trying to do is a take a list of URL's and download each URL's content (for indexing). The biggest problem is that if I encounter a link that is something like a facebook event that simply redirects to the login page I need to be able to detect and skip that URL. It seems as though the robots.txt file is there for this purpose....
My URL structure is set up in two parallels (both lead to the same place ):
www.mydomain.com/subname
www.mydomain.com/123
The trouble is is that, the spiders are crawling into things like:
www.mydomain.com/subname/default_media_function
www.mydomain.com/subname/map_function
Note that the name "subname" represents thousands of dif...
Welcome,
How can i disallow in robots.txt indexing of pages
http://mysite.net/something,category1.php
http://mysite.net/something,category2.php
(...)
http://mysite.net/something,category152.php
I have try
Disallow: /something,*.php
But it say, i can't use wildcard (*) here.
...
hi
i want to disallow all files and folders on my site from SE bots,except a special folder and files in it.
can i use these lines at robots.txt file?
User-agent: *
Disallow: /
Allow: /thatfolder
is it right?
...