robots.txt

Multiple SiteMap: entries in robots.txt?

I have been searching around using Google but I can't find an answer to this question. A robots.txt file can contain the following line: Sitemap: http://www.mysite.com/sitemapindex.xml but is it possible to specify MULTIPLE sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps r...

Parsing robots.txt with JavaScript?

Is there any robots.txt parser written in JavaScript? I'm usually coding with python and there's robotparser: http://docs.python.org/library/robotparser.html, which is very easy to use. ...

Robots.txt syntax

I not expert on robots.txt and ı have the following in one of my clients robots.txt User-agent: * Disallow: Disallow: /backup/ Disallow: /stylesheets/ Disallow: /admin/ I am not sure about the second line. Is this line disallows all spiders? ...

Allow search engine to crawl usernames

Hi, I have a site where users can enter their profile and password-protect certain details. I would like search engines to crawl the 'unprotected' parts of the profile (which varies from user to user). Similar to how if you enter a user's name in facebook, their Facebook profile comes up in the search results. Do I have to do anything ...

Htaccess/robots.txt to allow search bots to explore main domain but not directory on other domain

Ok, I understand the Title didn't make any sense so here I've tried to explain it in detail. I'm using a hosting that gives me space for my domain and lets me "add on" other domains on it. So lets say I have a domain A, and I add on a domain B. Basically my hosting gives me a public_html where I can put stuff that shows when someone vis...

disallow certain url in robots.txt

We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we're beginning to suspect that search engine crawlers etc. are getting through. The urls used look like this: http://www.thesite.com/path/to/the/page/...

How do I forbid search engine to index subdirectory /CRM with robot.txt?

Or even forbid indexing the whole site? UPDATE Is the space after : mandatory in robots.txt? ...

Vs2008 Find in Files slows down against Robots.txt

In VS2008 when I do a find in files across entire website which includes a robots.txt file, I notice that the search pauses for 20-30 secs on robots.txt. Any ideas how to resolve this issue ...

What does robots.txt file do in PHP project?

What does robots.txt file do in PHP project? ...

Python's robotparser ignoring sitemaps

I've the following robots.txt User-agent: * Disallow: /images/ Sitemap: http://www.example.com/sitemap.xml and the following robotparser def init_robot_parser(URL): robot_parser = robotparser.RobotFileParser() robot_parser.set_url(urlparse.urljoin(URL, "robots.txt")) robot_parser.read() return robot_parser But when...

Where to put robots.txt file?

Where to put robots.txt? domainname.com/robots.txt ot domainname/public_html/robots.txt I placed here domainname.com/robots.txt but it's not opening when i type this in browser ...

SEO Help with Pages Indexed by Google

I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all. Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3 I created my own CMS, and this is simply a break...

my robots.txt file in web application

Hi, I am using asp.net with C#. To increase the searchibility of my site in google, I have searched & found out that I can do it by using my robots.txt , but I really don't have any idea how to create it and where can I place my tag like 'asp.net, C#' in my txt file. Also, the necessary steps to to include it in my application. Plea...

Can I tell sitecrawlers to visit a certain page?

Hi there! I have this drupal website that revolves around a document database. By design you can only find these documents by searching the site. But I want all the results to be indexed by Googlebot and other crawlers, so I was thinking, what if I make a page that lists all the documents, and then tell the robots to visit the page to i...

robots.txt parser java

hi i want to know how to parse the robots.txt in java. already any code is there? thanks in advance ...

How to maximize the visibility of a plurilingual web site?

I've been told to understand how to maximize the visibility of an upcoming web application that is initially available in multiple languages, specifically French and English. I am interested in understanding how the robots, like the google bot, scrapes a site that is available in multiple language. I have a few questions concerning the...

Scraping websites in Java

What I am trying to do is a take a list of URL's and download each URL's content (for indexing). The biggest problem is that if I encounter a link that is something like a facebook event that simply redirects to the login page I need to be able to detect and skip that URL. It seems as though the robots.txt file is there for this purpose....

Specifying variables in robot.txt

My URL structure is set up in two parallels (both lead to the same place ): www.mydomain.com/subname www.mydomain.com/123 The trouble is is that, the spiders are crawling into things like: www.mydomain.com/subname/default_media_function www.mydomain.com/subname/map_function Note that the name "subname" represents thousands of dif...

Disallow dynamic htaccess rewrited url.

Welcome, How can i disallow in robots.txt indexing of pages http://mysite.net/something,category1.php http://mysite.net/something,category2.php (...) http://mysite.net/something,category152.php I have try Disallow: /something,*.php But it say, i can't use wildcard (*) here. ...

allow and disallow in robots.txt file

hi i want to disallow all files and folders on my site from SE bots,except a special folder and files in it. can i use these lines at robots.txt file? User-agent: * Disallow: / Allow: /thatfolder is it right? ...