tags:

views:

58

answers:

2

I want to protect my website from site copiers. I have looked at Ajax Toolkit NoBot, but unfortunately it is not meeting my requirements.

Below are my requirements.

  1. only 0.5% pages will have post-backs, rest of pages looks like static pages. so detection should happen at initial request, not at post-back.
  2. same time i want allow search engine crawler. what is the best way to detect search-bots? user agent is not right way?

And also is it possible to obfuscate page content by padding extract words(my site url, etc) in the middle of content and those words will not displayed my website. But the these padded words should not be removed easily by using either JQuery(client-side)/HTMLDocument(server-side) coding.

Any abstract idea also welcome.

If your answer is no, please do not answer. Suggest me if any possible ways are there.

+3  A: 

You cannot. When allowing sources to see your data, and also allowing Google Bots, it's impossible to only block crawlers, anyone can do like they are Google.


You can however block folks that try to steal your data, for example:

Create a byte array out of the IP address of the request, then grab this array (which looks like 1000101011100 or something), and work through all the spaces in the text. If the IP address array contains an 0, replace the space by two spaces.

When you find a website that has copied the text from your website, check out the source, and by the whitespace pattern you can extract the crawlers IP and block this IP from your webservers.

Jan Jongboom
+1 for an interesting approach.
Chris Lively
+1  A: 

For obfuscating the content, you don't want to assume that JavaScript will remove the display of the obfuscation on the client side because users without JavaScript enabled will get nonsense from your page. (Not to mention screen readers and other accessibility concerns on the resulting HTML.) If you must obfuscate the text like that, at the very least do so with CSS instead of JavaScript because it would be more compliant, but I still don't recommend it.

Out of curiosity, what is the purpose for this effort? By making something publicly available on the internet, its very nature is to be copyable. What specifically are you trying to prevent and why?

David
Not true, bots that want to steal your data will just set their user agent to something like GoogleBot.
Jan Jongboom
@Jan: Ah, good point. Editing the answer now.
David