views:

140

answers:

5

I just want to let Google, Bing, Yahoo crawl my website to build indexes. But I do not want my opposite website use crawling service to steal my website content. What should I do?

+3  A: 

You can prevent Google, etc., from indexing your website, but you cannot prevent a malicious crawler from doing that.

Tuomas Pelkonen
Right. A crawler can make itself look exactly like a legitimate user if it goes to enough trouble.
Mark Ransom
A: 

If someone is out to steal your content they most likely won't care for nor obey the restrictions anyway.

Only option I can think of is knowing where they crawl from and block them from seeing the site at all.

Don
yeah, it is true if I know which source they use to steal. But infact I do not know, so I am confusing how to apply white-list based access control (not black-list) for crawling systems.
tranhuyhung
+1  A: 

I want the world to be able to find me, but I want to be invisible? At least one of us is confused...

mickeyf
I just want to apply the white-list access control for crawling engines. Certainly I need my website are crawled by Google, Bing, Yahoo on its title, description but no content. Other malicious crawling system do not obey the popular rules, so that our information can be stolen without permission.Thanks for your answer, but it makes no sense.
tranhuyhung
+2  A: 

Why not try track browsing patterns - if you are getting lots of clicks or weird browsing patterns that wouldnt come from a human throw up a captcha page.

Luke Lowrey
+2  A: 

try crawling google.com with a custom crawler and see what they do , you can do the same :). Browsing patterns is the key to your problem :).

Sumit Ghosh
Thanks a lot. I will try browsing patterns.
tranhuyhung