views:

116

answers:

2

Hi, I have a bunch of big txt files (game walkthroughs) that I need translating from English to French. My first instinct was to host them on a server and use a PHP script to automate the translation process by doing a file_get_contents() and some URL manipulation to get the translated text. Something like:

http://translate.google.com/translate?hl=fr&sl=en&u=http://mysite.com/faq.txt

I found it poses two problems: 1) there are frames 2) the frame src values are relative (ie src="/translate_c?....") so nothing loads.

Is there any way to fetch pages translated via Google in PHP (without using their AJAX API as it's really not suitable here)?

+1  A: 

Use cRL to get the resulting page and then parse it.

Oren
Thanks, but wouldn't the IFrames still be a problem?
mathon12
Use cURL to get the url posted by Chaim above and you should be set. This should also avoid complications with Javascript, because it will only grab the HTML.
Oren
I'm going with cURL, thanks!
mathon12
Would appreciate if you checked this as the correct answer then. Good luck.
Oren
+1  A: 

Instead of using the regular translate URL which has frames, use the src of the frame:

http://translate.googleusercontent.com/translate_c?hl=&lt;TARGET LANGUAGE>&sl=<SOURCE LANGUAGE>&tl=af&u=http://&lt;URL TO TRANSALTE>&rurl=translate.google.com&twu=1&usg=ALkJrhhxPIf2COh7LOgXGl4jZdEBNutZAg

For example to translate the page http://chaimchaikin.za.net/ from English to Afrikaans:

http://translate.googleusercontent.com/translate_c?hl=en&amp;sl=en&amp;tl=af&amp;u=http://chaimchaikin.za.net/&amp;rurl=translate.google.com&amp;twu=1&amp;usg=ALkJrhhxPIf2COh7LOgXGl4jZdEBNutZAg

This will open up only a "frameless" page of the translation.

You may want to examine and test around to find the codes for the required language. Also bear in mind that Google may add scripts to the translation (for example to show original text on hover).

EDIT: It appears, on examing the code, that there is a lot of javascript in between the translation. You may need to find a way to get rid of it.

EDIT: Further examination shows that the end bit "usg=ALkJr..." seems to change every time. Maybe first run a request on the regular Google translate page (e.g. http://translate.google.com/translate?hl=fr&amp;sl=en&amp;u=http://mysite.com/faq.txt) than find and parse the "usg=.." part and use it for your next request on the "frameless" page (http://translate.googleusercontent.com/translate_c?...).

Chaim Chaikin
Hi, thanks, I had already tried doing that but it seems inconsistent with the frames (they seem to still be there...). I think a lot depends on that key in the end (usg). I'll play around with these ideas now, thanks.
mathon12