views:

61

answers:

4

hi,

A section of my website is accessible only for authenticated users. I was wondering if this pages are crawled by google, or they are kinda "hidden" to the search engine.

thanks

+5  A: 

If they are closed to users who are not authenticated, they are of course also closed to Google. The Google bot is nothing but another client trying to access your site.

Some sites like Newspapers have content that is reserved to paying users, but they are visible in search engines. That is always a conscious act on the side of the web master to open up the site to search engine bots, even though they are not paying customers.

Search engines have no "special key" to get into the house.

Pekka
So all it takes to read those sites for free is changing your UA string to one of a search bot? :P
Bart van Heukelom
@Bart in fact, it is possible. There are sites where that is enough (I think it was possible for some time to read the articles of a major Newspaper that way, was it the New York Times?) however, most sites also match the client IP to known Google IPs.
Pekka
Also, isn't that against Google rules? They're getting higher rankings while not actually offering the information to everyone.
Bart van Heukelom
Actually, I ran once in a gotcha when using phpBB's authentication system for my website. Because that *has* that "special key" enabled per default (search engine bots are automatically authenticated, if you don't *disable* that 'feature'). But yes, apart from that, there's nothing to worry if the authentication works. +1
Boldewyn
@Bart I guess they have deals with Google. After all, Google are interested to get content into their News service, and also into the index.
Pekka
@Boldewyn wow! Enabled by default? *That* is bad. Good to know.
Pekka
@Pekka: Well, it's perhaps what you might want for a forum. I just never thought of this when I used it for the whole website. Just wondered, why the contents of my website showed up in Google searches...
Boldewyn
@Boldewyn yeah, but a forum can also be private in itself. I bet there is a truckload of confidential forums out there that have this enabled.
Pekka
+2  A: 

If you are still in question, you may query google with "site:yourside.com" and check the result pages.

Jason
+2  A: 

As a web crawler is just another client trying to access your site then the authenticated area will be inaccessible to the crawler too.

If you want to tell web crawlers not to indexing other parts of your website, use a file called robots.txt that you place in the root directory of your site. For example:

robots.txt

User-agent: *
Disallow: /hidden

This will tell all web crawlers not to indexing content inside the directory 'hidden'.

greenie
Agreed - but the robots.txt file does not actively "prevent" crawlers from indexing content, it merely asks them not to do so.
Neil Moss
Of course, how silly of me. Edited.
greenie
+1  A: 

If your site has links to the pages which require authentication, then, yes, Google will attempt to crawl it. It is down to you to ensure that unauthenticated users are not served.

As Greenie suggests, use the Robots.txt file to tell search engines not to attempt to crawl your protected content.

Remember that obeying the instructions in Robots.txt is voluntary. There is nothing to stop a web crawler from actually requesting such content, and if so, a Robots.txt file could be equivalent to a message on the front door saying "Valuable stuff here!!".

Neil Moss