views:

39

answers:

2

I run a specialized news site and am trying to apply a little bit of SEO sauce to it. One of the most important things I hear is to avoid duplication of content. I've covered all the basics but I'm stuck with ordering of content.

As an example, the archive of the site is orderable by date, views, and rating. Since we don't have that many news items, an archive page for a particular day has usually only a couple of items, so the following URLs all have the same content, albeit in different ordering:

  • /news/archive/2010/05/16/
  • /news/archive/2010/05/16/?o=views
  • /news/archive/2010/05/16/?o=rating

Do search machines penalize this particular kind of duplication of content? And if yes, what's the best way to avoid said penalty? <link rel="canonical" />? Tell Google & Co. to ingore the o parameter? Marking the ordering links with nofollow? Only allow the indexation of the date-ordered archive sites through robots.txt (not sure if this is even possible)?

A: 

I don't know whether it affects search engine rankings or not (probably does). You can tell in Google Webmaster which query string parameters to ignore (see Site Configuration/Settings). You can also add this to robots.txt:

User-agent: *
/news/archive/*/*/*/?*o=

This is block the crawler even if you have other query parameters, as in

/news/archive/2010/05/16/?direction=asc&o=date
Artefacto
+1  A: 

If you have a dynamic URL which is in the standard format like foo?key1=value&key2=value2 we recommend that you leave the url unchanged, and Google will determine which parameters can be removed

http://googlewebmastercentral.blogspot.com/2008/09/dynamic-urls-vs-static-urls.html

Basically, Google doesn't care about that. Googlebot is smart enough to handle this issue for you.

I always use a canonical tag, seems cleaner to me.

Ben