ansaurus

Question

How Does WordPress Block Search Engines?

Answer 1

+3 A:

With a robots.txt (if installed as root)

 User-agent: *
 Disallow: /

or (from here)

I would like to block search engines, but allow normal visitors - check this for these results:
Causes "<meta name='robots' content='noindex,nofollow' />" to be generated into the section (if wp_head is used) of your site's source, causing search engine spiders to ignore your site. * Causes hits to robots.txt to send back:
    User-agent: * 
    Disallow: / 
Note: The above only works if WordPress is installed in the site root and no robots.txt exists.
Stops pings to ping-o-matic and any other RPC ping services specified in the Update Services of Administration > Settings > Writing. This works by having the function privacy_ping_filter() remove the sites to ping from the list. This filter is added by having add_filter('option_ping_sites','privacy_ping_filter'); in the default-filters. When the generic_ping function attempts to get the "ping_sites" option, this filter blocks it from returning anything.

Hides the Update Services option entirely on the Administration > Settings > Writing panel with the message "WordPress is not notifying any Update Services because of your blog's privacy settings."

Andy 2010-03-15 13:30:16

Answer 2

+1 A:

I don't know for sure but it probably generates a robots.txt file which specifies rules for search engines.

thetaiko 2010-03-15 13:30:29

Answer 3

+1 A:

Using a Robots Exclusion file.

Example:

User-agent: Google-Bot
Disallow: /private/

St. John Johnson 2010-03-15 13:30:54

Answer 4

+5 A:

According to the codex, it's just robots meta tags, robots.txt and suppression of pingbacks:

Causes <meta name='robots' content='noindex,nofollow' /> to be generated into the section (if wp_head is used) of your site's source, causing search engine spiders to ignore your site.

Causes hits to robots.txt to send back:

User-agent: *

Disallow: /

Note: The above only works if WordPress is installed in the site root and no robots.txt exists.

These are "guidelines" that all friendly bots will follow. A malicious spider searching for E-Mail addresses or forms to spam into will not be affected by these settings.

Pekka 2010-03-15 13:31:22

Answer 5

+1 A:

You can't actually block bots and crawlers from searching through a publicly available site; if a person with a browser can see it, then a bot or crawler can see it (caveat below).

However, there is something call the Robots Exclusion Standard (or robots.txt standard), which allows you to indicate to well behaved bots and crawlers that they shouldn't index your site. This site, as well as Wikipedia, provide more information.

The caveat to the above comment that what you see on your browser, a bot can see, is this: most simple bots do not include a Javascript engine, so anything that the browser renders as a result of Javascript code will probably not be seen by a bot. I would suggest that you don't use this as a way to avoid indexing, since the robots.txt standard does not rely on the presence of Javascript to ensure correct rendering of your page.

Once last comment: bots are free to ignore this standard. Those bots are badly behaved. The bottom line is that anything that can read your HTML can do what it likes with it.

Dancrumb 2010-03-15 13:36:06

ansaurus

tags:

views:

answers:

How Does WordPress Block Search Engines?

related questions