ansaurus

Question

How to disallow search pages from robots.txt

Answer 1

+2 A:

I believe something like this should work out, though I have not actually tested it.

User-Agent: *
Disallow: /startup?page=
Disallow: page=
Disallow: ?page=

Disallow The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved.

meder 2009-10-04 22:24:25

thanks for the answer, i forgot to add another detail here startup can be random /XXXXX?page

pmarreddy 2009-10-04 22:47:27

How many subdirectories do you have? You can try doing /?page= or add each subdirectory and do /subdirectory?page=

meder 2009-10-04 22:54:47

i found this User-agent: *Disallow: /*page=is this right

pmarreddy 2009-10-04 22:58:19

Try:Disallow: page= or Disallow: ?page=

meder 2009-10-04 23:02:28

i added both, thanks

pmarreddy 2009-10-04 23:08:57

can u add the comment text to the answer so that it will be helpful for other people

pmarreddy 2009-10-04 23:11:21

Answer 2

+1 A:

This should do the trick:

User-agent: *
Allow: /startup
Disallow: /startup?page=*

Adam 2009-10-04 22:25:01

thanks for the answer, i forgot to add another detail here startup can be random /XXXXX?page

pmarreddy 2009-10-04 22:48:00

Does the 'Allow:' even serve a purpose? And I don't think you need th wildcard.

meder 2009-10-04 22:53:21

If for whatever reason he's disallowed /, then it'll re-allow it. Better safe than sorry.

Adam 2009-10-05 00:36:28

Answer 3

+1 A:

You can put this on the pages you do not want indexed:

<META NAME="ROBOTS" CONTENT="NONE">

This tells robots not to index the page.

On a search page, it may be more interesting to use:

<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

This instructs robots to not index the current page, but still follow the links on this page, allowing them to get to the pages found in the search.

Phillip Knauss 2009-10-04 22:25:11

Answer 4

+2 A:

Create a text file and name it: robots.txt
Add user agents and disallow sections (see sample below)
Place the file in the root of your site

Sample:

###############################
#My robots.txt file
#
User-agent: *
#
#list directories robots are not allowed to index 
#
Disallow: /testing/
Disallow: /staging/
Disallow: /admin/
Disallow: /assets/
Disallow: /images/
#
#
#list specific files robots are not allowed to index
#
Disallow: /startup?page=2
Disallow: /startup?page=3
Disallow: /startup?page=3
# 
#
#End of robots.txt file
#
###############################

Here's a link to Google's actual robots.txt file

You can get some good information on the Google webmaster's help topic on blocking or removing pages using a robots.txt file

Metro Smurf 2009-10-04 22:42:55

thanks for the answer, i forgot to add another detail here startup can be random /XXXXX?page

pmarreddy 2009-10-04 22:48:39

Using this method you'd have to manually add all the ?page=(number), you can leave that part off according to the spec.

meder 2009-10-04 22:53:57

ansaurus

tags:

views:

answers:

How to disallow search pages from robots.txt

related questions