views:

195

answers:

3

How does Google Toolbar determine the language of a page to offer translation from it?

Google is mis-identifiying a simple login page on our site as Filipino and offering to translate it into English. I've tried added a lang="en" attribute to the <html> element of the page, but that seems to have made no difference.

Anyone know why this is happening?

Edit: It's a login page. The text of the page consists only of the following:

Admin
Log Out
Admin Panel Login
Username
Password

Plus a logo and some input boxes.

When I press the translate button, it doesn't seem to change anything.

+1  A: 

According to this article on multilingual websites from the Google Webmaster blog, Google's crawlers ignore language metadata such as the "lang" attribute and infer the language from the page content. Their explanation is that the lang attribute is sometimes auto-generated and therefore not reliable. Perhaps adding more English text to the page and ensuring that all the English is well-formed may fix the problem, although submitting a bug report to Google is a better way to fix the problem than adding random English text.

Michael Aaron Safyan
+1  A: 

One way you can fix this problem is to let Google know it made a mistake on translating your page. Not a real solution though, especially if there's a whole website dealing with this issue.

Prutswonder
Any idea what this does? Do they add the page to a mistranslation list or do they just look at it to try and improve their heuristics?
Martin Smith
Probably both, but unless someone from Google confirms it, it will probably remain a Google Mystery(tm).
Prutswonder
I've added the page in question, and it didn't fix the problem immediately. Hopefully it will fix it in time, though.How did you find this page? Is it's use documented somewhere?
rjmunro
Prutswonder
A: 

I had this problem on an aspx form I was making. By means of process of elimination, I was able to identify the problem for me was in my calendar control. I was using the calendar control and in my skin I was setting the DayNameFormat="Shortest". With this property, I had the issue, without it I did not. What this property did was take my days of the week and change them from "Mon" to "Mo". I'm speculating that the Google Language inference was reading "words" like "Mo" and "Tu" and using this to guess that this was Filipino. Since I didn't have many other words, this must have been enough of a weight to determine that the page was Filipino.

Hope that helps!

jMo

jMo