views:

29

answers:

3

Hello,

On one of my sites have a lot of restricted pages which is only available to logged-in users, and for everyone else it outputs a default "you have to be logged in ... " view.

The problem is; a lot of these pages are listed on Google with the not-logged-in-view, and it looks pretty bad when 80% of the pages in the list have the same title and description/preview.

Would it be a good choice to, along with my default not-logged-in-view, send a 401 unauthorized header? And would this stop Google (and other engines) to index these pages?

Thanks!

(and if you have another (better?) solution I would love to hear about it!)

+1  A: 

Use a robots.txt to tell search engines not to index the not logged in pages.

http://www.robotstxt.org/

Ex.

User-agent: *
Disallow: /error/notloggedin.html
David Mårtensson
Yeah, that's what I'm currently working in. I just wanted to get some feedback on this method since it would make things less "painful" because you wouldn't have to add every page to Disallow.
jyggen
`User-agent` is just an suggestion for search engines. It could happen, if anybody links to your pages, that they still will be added to the search index.
Jan
@Jan: No, not for any of the search engines; they all obey robots.txt. No matter where the link comes from, they won't index the page if it's blocked by robots.txt.
DisgruntledGoat
+1  A: 

Response code 403 is for requests, where authentication makes no difference, eg. disabled directory browsing.
401 Unauthorized would be the response to send.

Jan
Oh, forgot about 401. Edited my question.
jyggen
A: 

hi, here are the status codes googlebot understands and recommends. http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40132 in your case an HTTP 403 would be the right one.

Franz