views:

66

answers:

1

Let's say several external sites are scraping/harvesting your content and posting it as their own. Let's also say that you maintain a single unique/permanent URL for each piece of content, so that content aliasing (on your site) is never an issue.

Is there any value from an SEO perspective to including a canonical link in your header anyway, such that when your site is "scraped", the canonical indication is injected into whatever site is stealing your content (assuming they harvest the raw HTML rather than going in through RSS etc.)?

I've heard different things about the behavior of cross-site canonical links, from "they're ignored" to "behavior undefined" to "it can't hurt" to "sure that's exactly what canonical is intended for". My impression was that canonical was a good way of dealing with intra-site but not necessarily inter-site aliasing.

+5  A: 

I can't answer your question directly.

You (someone in your company) should contact the parties who are syndicating your content without permission, and try to get them to do it with permission. You should clarify your policy on unauthorised syndication. This is of course a business decision and your business development / process people and IP lawyers will probably have to get involved.

If they persistently continue to do it and you absolutely need to get them to stop, you can start serving junk to their robots. Detecting their robots may be nontrivial, as they will probably be forging a "real" user-agent header and using varying IP addresses (Most miscreants seem to use EC2 these days), however, if you are successful then their web sites will become full of junk.

Once their web sites become full of junk (or worse) then you can contact them again asking them if they'd like to stop their obnoxious behaviour.

MarkR
+1 for feeding robots junk. everyone knows that is their favorite food.
Sky Sanders
I don't normally advocate feeding robots junk, ideally you should persuade the robot users to stop by themselves. Feeding robots junk can have bad effects you don't want.
MarkR