chinese

Split a sentence into separate words

Hi, guys! I need to split a Chinese sentence into separate words. The problem with Chinese is that there are no spaces. For example, the sentence may look like: 主楼怎么走 (with spaces it would be: 主楼 怎么 走). At the moment I can think of one solution. I have a dictionary with Chinese words (in a database). The script will: 1) try to find th...

unable to export thai character into excel

Using this code <%@ page language="java" contentType="text/html; charset=ISO-8859-1" pageEncoding="ISO-8859-1"%> <%@page import="java.io.*"%> <%@page import="com.db.action.SearchFormDBImage"%> <%@ page import=" java.util.*"%> <%@page import ="org.apache.poi.hssf.usermodel.HSSFSheet"%> <%@page import ="org.apache.poi.hssf.usermodel.H...

Form scaling issue on Chinese OS (96 dpi)

Hi All, I have a sample .net application which consist of 2 forms. I have used images and various controls over these forms. When I run this application under XP or Win 7 English version, it works fine. But, when I run this application under Chinese version OS, the form size changes. It increases the form size, causing distorted forms. ...

wxpython GUI having static Japanese text and chinese static text.

Hi All, We want to support localization of the static text (labels, button labels, etc) to Japanese and Chinese in wxpython. We want only static text within the GUI elements to be changed, hard coding of Japanese or Chinese characters in the label(static text fields) would do the work for us. Any help on how to pursue this would be help...

Get source code with Chinese characters PHP

Well, I give up. I've been messing around with all I could think of to retrieve data from a target website that has information in traditional Chinese encoding (charset=GB2312). I've been using the simple_html_parser like always but it doesn't seem to return the Chinese characters, in fact all I get are some weird question marks embedde...

Can't make (UTF-8) Traditional Chinese Character to work in PHP gettext extension (.po and .mo files created in poEdit)

I Checked MSDN and the locale string is zh_Hant but I also tried with zh_TW (Chinese, Taiwan). The Traditional Chinese Characters look OK in the poEditor, but when I open the file in the browser the characters are just weird symbols («¢Åo¥@¬É!). I think the translation is working but there' something wrong with the encoding (I used UTF-8...

Programatically determine number of strokes in a chinese character?

Does Unicode store this information about characters? ...

Chinese encoding issue while listing files

I am running a Java application on a Solaris10 with Chinese. Now there are some files in a directory with chinese filenames. When I do files = new File(dir).list() where "dir" is the parent directory containing that chinese file, I get the result filename files[0] as ?????(some junk characters). Now the deal is that my programs file.enc...

IE7 not displaying chinese characters in <select>

I have installed fonts for East Asian languages and everything outside of select boxes displays correctly, but I get just get squares inside of select boxes. I've seen from google that other people have experienced this, but there doesn't seem to be a solution that I've found. Anyone out there have one? I'm specifying UTF-8 encoding v...

Postgresql full text search in postgresql - japanese, chinese, arabic

I'm designing a fulltext search function in postgresql for my current project. It works ok with ispell/myspell dictionaries so far. Now I need to add support for chinese, japanese and arabic search. Where do I start? There are no templates or dictionaries available for those languages as far as I can see. Will it work with pg_catalog.sim...

String searching algorithm for Chinese characters.

There are Python code available for existing algorithms for normal string searching e.g. Boyer-Moore Algorithm. I am looking to use this on Chinese characters and it doesn't seem like the same implementation would work. What would I go about doing in order to make the algorithm work on Chinese characters? I am referring to this: http://...

iPhone Chinese simplified to traditional character conversion

Is there a way to convert Chinese simplified characters to traditional characters in Cocoa/Objective-C? On the .NET platform you can include a VB dll in your projects that gives you access to a function for an easy conversion. Is there anything I can use in Cocoa/Objective-C that will allow me to do the same? I want to go between simplif...

Chinese Characters in email sent via PHP not showing up.

hi All, a funny problem. I send mail via PHP from my testing server with Chinese chars in it and it sends perfectly. Encoding is utf-8. When I upload the same PHP file to another server and try to send from there, the e-mail will look 90% fine in one mail client (web-based mail actually, gmail), but in my mail client (Apple Mail) it's a...

Chinese thesaurus

Hi, I'm working on a multilingual software where I need to show synonyms of Chinese text to the user. I couldn't find any API that could give synonyms of a given Chinese word. I tried MS-Office proofing tools but they do not support thesaurus for chinese, rather they provide grammar, translation etc. Please suggest some API or workarou...

Chinese string input for python?

How can I get python to work with simplified Chinese text input either as strings or raw input? ...

how to print chinese word in my code.. using python

this is my code: print '哈哈'.decode('gb2312').encode('utf-8') and it print : SyntaxError: Non-ASCII character '\xe5' in file D:\zjm_code\a.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details how to print '哈哈' thanks updated when i use this: #!/usr/bin/python # -*- coding: utf-8 -*- ...

Submitting Chinese characters results in XML entities

I am submitting a Chinese character to my form but once it is submitted it is coming as XML entity. For e.g. I am entering this 星洲 and the value going to my form is &#26143;&#27954; Any inputs how to convert this XML entity to the Chinese character equivalent. ...

Does anyone know why does silverlight not provide ime support when the plugin is set to 'windowless=true'

in silverlight(until version 4), if you set the property 'windowless' of the sl plugin to 'true', you can not get any ime support in textbox. does anyone know why? is it security concern or something else? ...

Freely available dictionary data for Chinese, Japanese, CJK characters

I am developing an online CJK character dictionary application, and already found the following databases: Unicode Unihan Database Jim Breen's JMDict and KanjiDic CEDict HanDeDict As I am looking for more data, web searches often lead me to online dictionaries, but not the data itself, using the same sources over again. If you know ...

How to find all Chinese text in a string using python?

I needed to strip the Chinese out of a bunch of strings today and was looking for a simple python regex. Any suggestions? ...