language-detection

(human) Language of a document

Is there a way (a program, a library) to approximately know which language a document is written in? I have a bunch of text documents (~500K) in mixed languages to import in a i18n enabled CMS (Drupal).. I don't need perfect matches, only some guess. ...

How to detect the language of a document - in PHP?

The basics have already been answered here. But is there a pre-built PHP lib doing the same as Lingua::Identify from CPAN? ...

Detect web user's language, e.g. in JavaScript?

I'm thinking of doing multiple language versions of my website (e.g. English and German). I'd like to offer a reasonable default based on the user's language. What's the easiest and least obstrusive way to do that? EDIT: The ideal solution would be not to use any server-side technology, but to encode everything in the html-files. Curre...

Testing for Japanese/Chinese Characters in a string

I have a program that reads a bunch of text and analyzes it. The text may be in any language, but I need to test for japanese and chinese specifically to analyze them a different way. I have read that I can test each character on it's unicode number to find out if it is in the range of CJK characters. This is helpful, however I would l...

Return the language of a given string

What's the best way to return the language of a given string? Using encoding trick or something. Thanks ...

Detect language of text

Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence", it should detect the language as "English". Or for "Esto es una sentencia" it should detect the language as "Spanish". I understand that language detection from text is not a deterministic problem. But both G...

Detect CJK characters in PHP

Hello, I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Korean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters). Thanks! ...

How to detect language of text?

I have a form which lets users input text snippets. So how can figure out the language of the entered text? Specifically these languages for now: Arabic: هذه هي بعض النصوص العربية Chinese: 这是一些阿拉伯文字 Japanese: これは、いくつかのアラビア語のテキストです [Edit] The detection has work on text which is retrieved via an API too (no browser involved) ...

How to detect language

Hi guys, Are there any good, open source engines out there for detecting what language a text is in, perhaps with a probability metric? One that I can run locally and doesn't query Google or Bing? I'd like to detect language for each page in about 15 million pages of OCR'ed text Cheers Nik ...

How can I detect a user's input language using Ruby without using an online service?

I'm looking for a library or technique to detect the input language of blocks of text provided by users. Online lookups (like Google translate) won't work for this task as I'm writing an app which must run offline. Thanks. ...

Language recognition and automatic textbox direction switch

Hi everyone, Say I have a textbox in HTML using the following code: <input type="text" name="text" id="text" /> And my site is intended to be for right-to-left as well as left-to-right languages. That means that I have some textboxes that will be typed in a right-to-left language, but the email textbox, for example, will be left-to-r...

Language identification (Opera, Safari, Chrome) in PHP

I'm using $_SERVER["HTTP_ACCEPT_LANGUAGE"]to detect the browser language, which works fine for Firefox and IE: Firefox: de,en-us;q=0.9,en;q=0.7,ru;q=0.6,ro;q=0.4,hu;q=0.3,zh;q=0.1 Internet Explorer: de Unfortunately, it doesn't work for the following browsers: Opera: en,en-US;q=0.9,ja;q=0.8,fr;q=0.7,de;q=0.6,es;q=0.5,it;q=0.4,pt;q=0...

PHP: How do I detect if an input string is Arabic

Is there a way to detect the language of the data being entered via the input field? ...

Google Language detection api replying error code 406

Hi, I am trying to use Google language detection API, Right now I am using the sample available on Google documentation as follows: public static String googleLangDetection(String str) throws IOException, JSONException{ String urlStr = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&amp;q="; // ...