ansaurus

Question

detect language from string in PHP

Answer 1

+7 A:

You can not detect the language from the character type. And there are no foolproof ways to do this.

With any method, you're just doing an educated guess. There are available some math related articles out there

Ólafur Waage 2009-09-17 22:08:27

Answer 2

+1 A:

One approach might be to break the input string into words and then look up those words in an English dictionary to see how many of them are present. This approach has a few limitations:

proper nouns may not be handled well
spelling errors can disrupt your lookups
abbreviations like "lol" or "b4" won't necessarily be in the dictionary

Greg Hewgill 2009-09-17 22:11:55

"lol" is an acronym. =]

strager 2009-09-17 22:13:49

@strager: and an acronym is a type of abbrevation: http://en.wiktionary.org/wiki/acronym :)

Greg Hewgill 2009-09-17 22:24:39

Answer 3

+2 A:

You can probably use the Google Translate API to detect the language and translate it if necessary.

strager 2009-09-17 22:22:00

Answer 4

+9 A:

You could do this entirely client side with Google's AJAX Language API.

With the AJAX Language API, you can translate and detect the language of blocks of text within a webpage using only Javascript. In addition, you can enable transliteration on any textfield or textarea in your web page. For example, if you were transliterating to Hindi, this API will allow users to phonetically spell out Hindi words using English and have them appear in the Hindi script.

You can detect automatically a string's language

var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});

And translate any string written in one of the supported languages

google.language.translate("Hello world", "en", "es", function(result) {
  if (!result.error) {
    var container = document.getElementById("translation");
    container.innerHTML = result.translation;
  }
});

voyager 2009-09-17 22:24:01

Answer 5

+1 A:

Perhaps submit the string to this language guesser:

http://www.xrce.xerox.com/competencies/content-analysis/tools/guesser

Andy 2009-09-17 22:24:54

Answer 6

+1 A:

I would take documents from various languages and reference them against Unicode. You could then use some bayesian reasoning to determine which language it is by the just the unicode characters used. This would seperate French from English or Russian.

I am not sure exactly on what else could be done except lookup the words in language dictionaries to determine the language (using a similar probabilistic approach).

MathGladiator 2009-09-20 01:33:06

Answer 7

+2 A:

I've used the Text_LanguageDetect pear package with some reasonable results. It's dead simple to use, and it has a modest 52 language database. The downside is no detection of Eastern Asian languages.

require_once 'Text/LanguageDetect.php';
$l = new Text_LanguageDetect();
$result = $l->detect($text, 4);
if (PEAR::isError($result)) {
    echo $result->getMessage();
} else {
    print_r($result);
}

results in:

Array
(
    [german] => 0.407037037037
    [dutch] => 0.288065843621
    [english] => 0.283333333333
    [danish] => 0.234526748971
)

scott 2010-04-05 00:10:13

Answer 8

A:

you can use API of service Lnag ID http://langid.net/identify-language-from-api.html

2010-10-30 10:37:00

ansaurus

tags:

views:

answers:

detect language from string in PHP

related questions