views:

241

answers:

4

I'm looking for ways to prevent indexing of parts of a page. Specifically, comments on a page, since they weigh up entries a lot based on what users have written. This makes a Google search on the page return lots of irrelevant pages.

Here are the options I'm considering so far:

1) Load comments using JavaScript to prevent search engines from seeing them.

2) Use user agent sniffing to simply not output comments for crawlers.

3) Use search engine-specific markup to hide parts of the page. This solution seems quirky at best, though. Allegedly, this can be done to prevent Yahoo! indexing specific content:

<div class="robots-nocontent">
This content will not be indexed!
</div>

Which is a very ugly way to do it. I read about a Google solution that looks better, but I believe it only works with Google Search Appliance (can someone confirm this?):

<!--googleoff: all-->
This content will not be indexed!
<!--googleon: all-->

Does anyone have other methods to recommend? Which of the three above would be the best way to go? Personally, I'm leaning towards #2 since while it might not work for all search engines, it's easy to target the biggest ones. And it has no side-effect on users, unless they're deliberately trying to impersonate a web crawler.

+1  A: 

googleoff and googleon are for the Google Search Appliance, which is a search engine they sell to companies that need to search through their own internal documents. It's not effective for the live Google site.

I think number 1 is the best solution, actually. The search engines doesn't like when you give them other material than you give your users so number 2 could get you kicked out from the search listings altogether.

Emil Vikström
+4  A: 

I would go with your JavaScript option. It has two advantages:

1) bots don't see it 2) it would speed up your page load time (load the comments asynchronously and unobtrusively, e.g. via jQuery) ... page load times have a much underrated positive effect on your search rankings

mattRo55
A: 

This is the first I have heard that search engines provide a method for informing them that part of a page is irrelevant.

Google has a feature for web masters to declare parts of their site for a web search engine to use to find pages when crawling.

  1. http://www.google.com/webmasters/
  2. http://www.sitemaps.org/protocol.php

You might be able to relatively de-emphasize some things on the page by specifying the most relevant keywords using META tag(s) in the HEAD section of your HTML pages. I think that is more in line with the engineering philosophy used to architect search engines in the first place.

Look at Google's Search Engine Optimization tips. They spell out clearly what they will and will not let you do to influence how they index your site.

JohnnySoftware
+3  A: 

Javascript is an option but engines are getting better at reading javascript, to be honest I think your thinking too much into it, Engines love unique content, the more content you have on each page the better and if the users are providing it... its the holy grail.

Just because your commenter made a reference to star wars on your toaster review doesn't mean your not going to rank for the toaster model, it just means you might rank for star wars toaster.

Another idea would be, you could only show comments to people who are logged in, collegehumor do the same I believe, they show the amount of comments a post has but you have to login to see them.

Dom Hodgson
I don't think you see just how big the comment/page content ratio is. If you'd search for, for example, "how to register", you'd get lots of comment hits on irrelevant pages, before actually getting the page that has information about how to register. Simply because out of the hundreds of comments that some of the pages have, several of them will be talking about registering.
Blixt
yeah I see that now, editing my original answer
Dom Hodgson