views:

383

answers:

4

I can't seem to find any information on how google determines if you are cloaking your content. How, from a technical standpoint, do you think they are determining this? Are they sending in things other than the googlebot and comparing it to the googlebot results? Do they have a team of human beings comparing? Or can they somehow tell that you have checked the user agent and executed a different code path because you saw "googlebot" in the name?

It's in relation to this question on legitimate url cloaking for seo. If textual content is exactly the same, but the rendering is different (1995-style html vs. ajax vs. flash), is there really a problem with cloaking?

Thanks for your put on this one.

+1  A: 

Google looks at your site while presenting user-agent's other than googlebot.

Anon.
They do? And does this other user-agent still identify itself as some kind of robot? If not, would that not be very sneaky on Google's part?
Thilo
Please provide a source
Joe Philllips
Even different user agents can't help Google tell if a browser has used the z-index to overlay a div to hide certain content from view - does this qualify as "cloaking"?
John K
@jdk: google has created a browser with a rendering engine. They very well could tell.
whatsisname
Okay then, that's kind of what I posted below as a solution - I wasn't sure if my understanding really met the definition of cloaking but it appears it does or is close enough.
John K
@Thilo: Sneaky? I guess different people with have different takes, but I think it is OK as long as they respect robots.txt
Charles Stewart
+1  A: 

See the Google Chrome comic book page 11 where it describes (even better than layman's terms) about how a Google tool can take a schematic of a web page. They could be using this or similar technology for Google search indexing and cloak detection - at least that would be another good use for it.

alt text

John K
Can you explain a little how this (which is about automated testing of a rendering engine) relates to cloak detection?
Thilo
I'm speculating technology could be repackaged like "what the browser thinks it's displaying" and applied to what the Googlebot actually scrapes. It wouldn't be unlike TestSwarm for jQuery http://testswarm.com/ but Google would use server farms for it. Yah, it's out there but it has shreds of viability.
John K
My explanation is probably not very clear but basically I'm saying if Google (via Chrome) can create technology that demonstrates a difference between what a web browser "thinks" it sees and what is actually seen, then the idea is not infeasible that they can also have other technologies comparing the "thinking" vs "seeing" web world.
John K
+3  A: 

As far as I know, how Google prepares search engine results is secret and constantly changing. Spoofing different user-agents is easy, so they might do that. They also might, in the case of Javascript, actually render partial or entire pages. "Do they have a team of human beings comparing?" This is doubtful. A lot has been written on Google's crawling strategies including this, but if humans are involved, they're only called in for specific cases. I even doubt this: any person-power spent is probably spent by tweaking the crawling engine.

Yar

related questions