questions about icu | ansaurus

icu

Is there an STL and UTF-8 friendly C++ Wrapper for ICU, or other powerful Unicode library

I need a good Unicode library for C++. I need: Transformations in a Unicode sensitive way. For example sort all strings in a case insensitive way and get their first characters for index. Convert various Unicode strings to upper and to lower case. Split text at a reasonable position -- words that would work for Chinese and Japanese as ...

The word break rule file

IBM has apparently open-sourced their ICU source code for Unicode and Globalization support, part of which is a text boundary locator for detecting where breaks can be located in text. However, the break detection stuff relies on rules and I cannot locate the rules files anywhere. Where can I get the word break rules text files for com...

How can I transliterate chinese text to pinyin on iPhone?

The localization saga continues... So I'm trying to support collation of chinese text in my iPhone app, and after talking to a native chinese speaker, I think I understand how the chinese do it... Lets say you had the string 巴拉克·奥巴马 and you wanted to figure out which section of the chinese phonebook to put it in (in this example I'm ig...

internationalization

Looking for a good tutorial for ICU

I was looking recently for a toolkit/library with good unicode support. I had checked ICU, Qt3, Qt4 and Glib. Unfortunalty all of them with exception of ICU had some missing features or had implemented them incorrectly. Unfortunalty, ICU library has quite bad documentation and is very hard to use because it ignores most of modern C++ de...

How to parse kanji numeric characters using ICU?

I'm writing a function using ICU to parse an Unicode string which consists of kanji numeric character(s) and want to return the integer value of the string. "五" => 5 "三十一" => 31 "五千九百七十二" => 5972 I'm setting the locale to Locale::getJapan() and using the NumberFormat::parse() to parse the character string. However, whenever I pass...

Does ICU handle the collation of a list of strings of varying languages?

My application may have strings comprised of different alphabets / languages in a single list. I can't seem to find any information on what the correct method for sorting these should be or any indication that ICU supports this functionality. Example List: Apple яблоко μήλο Baby βρέφος ребенок ...

internationalization

Finding type of break in icu::BreakIterator

I'm trying to understang how to use icu::BreakIterator to find specific words. For example I have following sentence: To be or not to be? That is the question... Word instance of break iterator would put breaks there: |To| |be| |or| |not| |to| |be|?| |That| |is| |the| |question|.|.|.| Now, not every pair of break points is a...

natural-language

IBM ICU - String conversion functions

In IBM ICU C library are there any string to number conversion functions. Something like atoi and atoll. I am looking for ICU function for string conversions - Cross platform, cross compiler and 32 and 64 bit version. 1. Function should throw an error. overflow or underflow. 2. I thought using "errno" -- But errno is not set in all plat...

Compiling the icu sqlite extension statically linked to icu.

I want to compile the icu sqlite extension statically linked to icu. This is what I've tried, maybe the mistake is obvious to you. > cd icu/source > ./runConfigureIcu Linux --enable-static --with-packaging-format=archive ... > make > cd ../../icu-sqlite > gcc -o libSqliteIcu.so -shared icu.c -I../icu/source/common -I../icu/sour...

sqlite3 icu extention

Hi, I think sqlite3 very handy software on many situation. But I need ICU support for sort order. I red many documents at Internet. I gave up a few times to use sqlite, I deleted the my downloads. But I need sqlite again and again. Is it so difficult to create an extension for download? Where can I find a ready to use extension? Please h...

ICU Custom Currency Formatting (C++)

Is it possible to custom format currency strings using the ICU library similar to the way it lets you format time strings by providing a format string (e.g. "mm/dd/yyy"). So that for a given locale (say USD), if I wanted I could have all currency strings come back "xxx.00 $ USD". ...

How to convert estimated time to string with ICU library

How to convert estimated time to string with ICU library? ...

ICU Unicode Normal vs Fullwidth

I am somewhat new to unicode and unicode strings. I'm trying to determine the difference between "fullwidth" symbol and a normal one. Take these two for example: Normal: http://www.fileformat.info/info/unicode/char/20a9/index.htm Fullwidth: http://www.fileformat.info/info/unicode/char/ffe6/index.htm I notice that the fullwidth is def...

internationalization

Problem with cross compiling icu

Hi all I am trying to cross compile the icu library for the iPhone. I downloaded a configure script wrapper from http://sites.google.com/site/michaelsafyan/coding/articles/iphone When I execute it, I get the following error message: checking wchar.h usability... no checking wchar.h presence... yes configure: WARNING: wchar.h: present b...

cross-compiling

iPhone app rejection for using ICU (Unicode extensions)

I received the following mail form Apple, considering my application: *Thank you for submitting your update to Νομοθεσία to the App Store. During our review of your application we found it is using private APIs, which is in violation of the iPhone Developer Program License Agreement section 3.3.1; "3.3.1 Applications may only use Doc...

Can you get access to the NumberFormatter used by ICU MessageFormat

This may be a niche question but I'm working with ICU to format currency strings. I've bumped into a situation that I don't quite understand. When using the MesssageFormat class, is it possible to get access to the NumberFormat object it uses to format currency strings. When you create a NumberFormat instance yourself, you can specify a...

ICU add custom character set detection

Hi everybody, Does somebody know how ICU Charset Detector's data is built. And is it difficult to add additional languages? For example, I saw in the bug tracker that a ticket for the detection of Thai is opened since 2007 but nothing new until today. Thanks ...

internationalization

Comparing ICU sort keys (collator_get_sort_key) in PHP

Is strcmp() appropriate for comparing ICU collator sort keys in PHP? The sort keys I'm asking about are from collator_get_sort_key() which are described in ICU documentation. ...

C++ UTF-8 output with ICU

I'm struggling to get started with the C++ ICU library. I have tried to get the simplest example to work, but even that has failed. I would just like to output a UTF-8 string and then go from there. Here is what I have: #include <unicode/unistr.h> #include <unicode/ustream.h> #include <iostream> int main() { UnicodeString s = UNI...

Locale C++ shared library in /usr/local/lib

I'm venturing into the world of C++ and Linux, and am having problems linking against a shared library. I have a library, libicuuc.so.44.1, installed in /usr/local/lib. There is also a link in the same directory, libicuuc.so.44 pointing to that library. My /etc/ld.so.conf reads: include /etc/ld.so.conf.d/*.conf I have a file, /etc/l...

shared-libraries

1
2