Is there a way (a program, a library) to approximately know which language a document is written in?
I have a bunch of text documents (~500K) in mixed languages to import in a i18n enabled CMS (Drupal)..
I don't need perfect matches, only some guess.
...
The basics have already been answered here. But is there a pre-built PHP lib doing the same as Lingua::Identify from CPAN?
...
I'm thinking of doing multiple language versions of my website (e.g. English and German). I'd like to offer a reasonable default based on the user's language.
What's the easiest and least obstrusive way to do that?
EDIT: The ideal solution would be not to use any server-side technology, but to encode everything in the html-files. Curre...
I have a program that reads a bunch of text and analyzes it. The text may be in any language, but I need to test for japanese and chinese specifically to analyze them a different way.
I have read that I can test each character on it's unicode number to find out if it is in the range of CJK characters. This is helpful, however I would l...
What's the best way to return the language of a given string? Using encoding trick or something.
Thanks
...
Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence", it should detect the language as "English". Or for "Esto es una sentencia" it should detect the language as "Spanish".
I understand that language detection from text is not a deterministic problem. But both G...
Hello,
I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Korean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters).
Thanks!
...
I have a form which lets users input text snippets. So how can figure out the language of the entered text?
Specifically these languages for now:
Arabic: هذه هي بعض النصوص العربية
Chinese: 这是一些阿拉伯文字
Japanese: これは、いくつかのアラビア語のテキストです
[Edit] The detection has work on text which is retrieved via an API too (no browser involved)
...
Hi guys,
Are there any good, open source engines out there for detecting what language a text is in, perhaps with a probability metric? One that I can run locally and doesn't query Google or Bing? I'd like to detect language for each page in about 15 million pages of OCR'ed text
Cheers
Nik
...
I'm looking for a library or technique to detect the input language of blocks of text provided by users. Online lookups (like Google translate) won't work for this task as I'm writing an app which must run offline.
Thanks.
...
Hi everyone,
Say I have a textbox in HTML using the following code:
<input type="text" name="text" id="text" />
And my site is intended to be for right-to-left as well as left-to-right languages. That means that I have some textboxes that will be typed in a right-to-left language, but the email textbox, for example, will be left-to-r...
I'm using $_SERVER["HTTP_ACCEPT_LANGUAGE"]to detect the browser language, which works fine for Firefox and IE:
Firefox:
de,en-us;q=0.9,en;q=0.7,ru;q=0.6,ro;q=0.4,hu;q=0.3,zh;q=0.1
Internet Explorer:
de
Unfortunately, it doesn't work for the following browsers:
Opera:
en,en-US;q=0.9,ja;q=0.8,fr;q=0.7,de;q=0.6,es;q=0.5,it;q=0.4,pt;q=0...
Is there a way to detect the language of the data being entered via the input field?
...
Hi,
I am trying to use Google language detection API, Right now I am using the sample available on Google documentation as follows:
public static String googleLangDetection(String str) throws IOException, JSONException{
String urlStr = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=";
// ...