views:

258

answers:

3

Hi,

I want to extract string between html tags and convert it into other language using google api and to append the string with html tags.

For example,

<b>This is an example</b>

I want to extract the string "This is an example" and convert it into other language and then again append the string with bold tag.

Could anyone know how to proceed with this?

Regards Rekha

+1  A: 
$text = '<b>This is an example</b>';
$strippedText = strip_tags($text);
echo $strippedText; // This is an example
hsz
+3  A: 

The simplest way is to just use DOM parsing to get the contents of the HTML tags. However, you need to specify which tags you want to get the contents for. For example, you wouldn't want the contents of table or tr, but you may want the contents of td. Below is an example of how you would get the contents of all the b tags and replace the text between them.

$dom_doc = new DOMDocument();
$html_file = file_get_content('file.html');
// The next line will likely generate lots of warnings if your html isn't perfect
// Put an @ in front to suppress the warnings once you review them
$dom_doc->loadHTML( $html_file );
// Get all references to <b> tag
$tags_b = $dom_doc->getElementsByTagName('b');
// Extract text value and replace with something else
foreach($tags_b as $tag) {
    $tag_value = $tag->nodeValue;
    // get translation of tag_value
    $translated_val = get_translation_from_google();
    $tag->nodeValue = $translated_val
}
// save page with translated text
$translated_page = $dom_doc->saveHTML();
Brent Baisley
+1  A: 

Here's a working example of using the Google API to translate server side. The function takes a string as an input and strips HTML tags from it before translating.

The languages are passed in as arguments.

<?php
// This function translates a string $source written in the $fromLang languages to the $toLang language.
function translateTexts($source, $fromLang, $toLang)
{

    /* Language choices: 'AFRIKAANS' : 'af', 'ALBANIAN' : 'sq', 'AMHARIC' : 'am', 'ARABIC' : 'ar', 'ARMENIAN' : 'hy', 'AZERBAIJANI' : 'az', 'BASQUE' : 'eu', 'BELARUSIAN' : 'be', 'BENGALI' : 'bn', 'BIHARI' : 'bh', 'BRETON' : 'br', 'BULGARIAN' : 'bg', 'BURMESE' : 'my', 'CATALAN' : 'ca', 'CHEROKEE' : 'chr', 'CHINESE' : 'zh', 'CHINESE_SIMPLIFIED' : 'zh-CN', 'CHINESE_TRADITIONAL' : 'zh-TW', 'CORSICAN' : 'co', 'CROATIAN' : 'hr', 'CZECH' : 'cs', 'DANISH' : 'da', 'DHIVEHI' : 'dv', 'DUTCH': 'nl',  'ENGLISH' : 'en', 'ESPERANTO' : 'eo', 'ESTONIAN' : 'et', 'FAROESE' : 'fo', 'FILIPINO' : 'tl', 'FINNISH' : 'fi', 'FRENCH' : 'fr', 'FRISIAN' : 'fy', 'GALICIAN' : 'gl', 'GEORGIAN' : 'ka', 'GERMAN' : 'de', 'GREEK' : 'el', 'GUJARATI' : 'gu', 'HAITIAN_CREOLE' : 'ht', 'HEBREW' : 'iw', 'HINDI' : 'hi', 'HUNGARIAN' : 'hu', 'ICELANDIC' : 'is', 'INDONESIAN' : 'id', 'INUKTITUT' : 'iu', 'IRISH' : 'ga', 'ITALIAN' : 'it', 'JAPANESE' : 'ja', 'JAVANESE' : 'jw', 'KANNADA' : 'kn', 'KAZAKH' : 'kk', 'KHMER' : 'km', 'KOREAN' : 'ko', 'KURDISH': 'ku', 'KYRGYZ': 'ky', 'LAO' : 'lo', 'LATIN' : 'la', 'LATVIAN' : 'lv', 'LITHUANIAN' : 'lt', 'LUXEMBOURGISH' : 'lb', 'MACEDONIAN' : 'mk', 'MALAY' : 'ms', 'MALAYALAM' : 'ml', 'MALTESE' : 'mt', 'MAORI' : 'mi', 'MARATHI' : 'mr', 'MONGOLIAN' : 'mn', 'NEPALI' : 'ne', 'NORWEGIAN' : 'no', 'OCCITAN' : 'oc', 'ORIYA' : 'or', 'PASHTO' : 'ps', 'PERSIAN' : 'fa', 'POLISH' : 'pl', 'PORTUGUESE' : 'pt', 'PORTUGUESE_PORTUGAL' : 'pt-PT', 'PUNJABI' : 'pa', 'QUECHUA' : 'qu', 'ROMANIAN' : 'ro', 'RUSSIAN' : 'ru', 'SANSKRIT' : 'sa', 'SCOTS_GAELIC' : 'gd', 'SERBIAN' : 'sr', 'SINDHI' : 'sd', 'SINHALESE' : 'si', 'SLOVAK' : 'sk', 'SLOVENIAN' : 'sl', 'SPANISH' : 'es', 'SUNDANESE' : 'su', 'SWAHILI' : 'sw', 'SWEDISH' : 'sv', 'SYRIAC' : 'syr', 'TAJIK' : 'tg', 'TAMIL' : 'ta', 'TATAR' : 'tt', 'TELUGU' : 'te', 'THAI' : 'th', 'TIBETAN' : 'bo', 'TONGA' : 'to', 'TURKISH' : 'tr', 'UKRAINIAN' : 'uk', 'URDU' : 'ur', 'UZBEK' : 'uz', 'UIGHUR' : 'ug', 'VIETNAMESE' : 'vi', 'WELSH' : 'cy', 'YIDDISH' : 'yi', 'YORUBA' : 'yo', 'UNKNOWN' : ''  */

    // Creating the query URL
    $url = "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&amp;q=" . urlencode($source) . "&langpair=" . $fromLang . "%7C" . $toLang;

    // send translation request
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);     
    $response = curl_exec($ch);
    curl_close($ch);

    // now, process the JSON string
    $json = json_decode($response, true);

    // If response status is okay
    if ($json['responseStatus'] == 200)
    {
        $translated = $json['responseData']['translatedText'];
    } else 
    {
        $translated = "****Error. Couldn't translate.****";
    }     
    // return translated text
    return $translated;
}

// Get the string you want to translate
$string = "<b>This is an example</b>";
// Strip the HTML tags from the strip and translate it.
echo translateTexts(strip_tags($string), 'en', 'es');

?>

When you run the code above, you should add in a proper self identification to the header.

References:

Google Translation API section for Flash and Non-Javascript Interfaces
PHP cUrl examples
json_decode()
strip_tags()

Peter Ajtai
Thanks for your code.It is working fine for me.
Rekha
@rekhasatjvika - That's great! If the code is working, and it solved your problem, you should accept it by clicking on the outline of the check mark right under the up and down votes for the answer.
Peter Ajtai