questions about chinese

How to do a Python split() on languages (like Chinese) that don't use whtespace as word separator?

I want to split a sentence into a list of words. For English and European languages this is easy, just use split() >>> "This is a sentence.".split() ['This', 'is', 'a', 'sentence.'] But I also need to deal with sentences in languages such as Chinese that don't use whitespace as word separator. >>> u"这是一个句子".split() [u'\u8fd9\u662f\u...

python

string

unicode

nlp

chinese

Python: any way to perform this "hybrid" split() on multi-lingual (e.g. Chinese & English) strings?

I have strings that are multi-lingual consist of both languages that use whitespace as word separator (English, French, etc) and languages that don't (Chinese, Japanese, Korean). Given such a string, I want to separate the English/French/etc part into words using whitespace as separator, and to separate the Chinese/Japanese/Korean part ...

Are email addresses allowed to contain non-alphanumeric characters?

I'm building a website using Django. The website could have significant users from non-English speaking countries. I just want to know if there're any technical restrictions on what types of characters an email address could contain. Are email addresses only allowed to contain English alphabets + numbers + "_" + "@" + "."? Are they al...

Ruby on Rails Chinese Dictionary in Rails and MySQL?

Any help and advice deeply appreciated. I want to create a RoR website that is a chinese dictionary with 2 million entries (records in the SQLite database). Each record has 3 fields: a long Chinese word, an English text, and an integer counter (is updated every time the word has been solicited, +=1). The user inputs a Chinese word, the...

ruby-on-rails

sqlite3

chinese

How to read Asiatic characters (Japanese, Chinese) after json_encode in PHP

I've read every post about the topic but I don't think I've found a reply to my question, that's driving me crazy. I got a couple of php files, one stores data into mySQL db, another one read those data: I get data from all over the world and it seems that I succeed to store asiatic character in a right way, but when I try to read those...

Embedding and Displaying chinese/japanese

Hi all, I have been working on a subtitles engine for flash/flv video player. On my Mac everything is great, nice aliased glyphs, displaying all the characters, etc. Switch to windows, it all goes out the window. Some machines with Eastern Characters enabled display fine, but I can't guarantee all users will have this option selected. ...

Please help in translating this Chinese text

Someone help in translating the following text from image, it's Chinese. Thanks in advance ...

chinese

Chinese hand-writing recognition program for IPhone

I would like to start on Chinese hand-writing recognition program for IPhone...but I couldn't find any library or API that can help me to do so. It's hard for me to write the algorithm myself because of my time span. Some of suggestion recommended that I should make use of a back-end server to do the recognition work. But I don't know h...

How do I implement full text search in Chinese on PostgreSQL?

The title says it all. This question has been asked before: http://stackoverflow.com/questions/2495997/postgresql-full-text-search-in-postgresql-japanese-chinese-arabic but there are no answers for Chinese as far as I can see. I took a look at the OpenOffice wiki, and it doesn't have a dictionary for Chinese. Edit: As we are already ...

postgresql

full-text-search

chinese

LaTeX Question - Accents on characters

I refuse to believe that no one on stackoverflow can help me! Tone marks above Chinese characters in latex / Combining Accents in unicode My aim is to put tone marks above Chinese characters in latex, and google seems to not be letting on to the answer. Is it possible to use combining accents with chinese characters or can they only b...

Load A Chinese Web (CHM) url using UIWebView in iPhone?

I am trying to use UIWebView to load a Chinese web url. If I encode it with utf-8 then it becomes: html/libfunctions/%CA%FD%D7%E9%B9%DC%C0%ED(Array).htm -- xchm://03000000-0A00-0400-F25E-D84B09001600/ which cannot be loaded from UIWebView. If I try to put it using the default one of Chinese: html/libfunctions/录脝脢卤脝梅鹿脺脌铆(Timers).htm, th...

How to avoid default URL encoding in ASP.NET MVC Html Helpers like RouteLink

I want my url like this: "http://domain.com/tag/高兴" My route mapping: routes.MapRoute("Tag", "tag/{name}", new { controller = "Tag", action="Index" }); But Html.RouteLink will encode the parameters as default. If I use Html.RouteLink in my View, the generated html is: <a href="/tag/%E9%AB%98%E5%85%B4">高兴</a> Is there any way to a...