icu

Code to strip diacritical marks using ICU

Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é would become a plain ASCII e) from a UnicodeString using the ICU library in C++? E.g.: UnicodeString strip_diacritics( Un...

Why does ICU's Locale::getDefault() return "root"?

Using the ICU library with C++ I'm doing: char const *lang = Locale::getDefault().getLanguage(); If I write a small test program and run it on my Mac system, I get en for lang. However, inside a larger group project I'm working on, I get root. Anybody have any idea why? I did find this: http://userguide.icu-project.org/locale/resou...

ICU regex quoting

I am wondering if there is a way to quote a string in the ICU (c++) library. There exists "\Q" + string + "\E" but I am having generated input come in as the string provided. There does not seem to be any ICU quote regex method. Would just changing all "\E" in string to \\E work. ...

Building xalana 1.11 using ICU - Mac

Hi Everybody, Has anybody compiled xalan 1.11 using ICU? I am building it using ICU and its generating one library called libxalanMsg.111.0.dylib and its being generated using the below mentioned steps ============ /tmp/brijesh/ICU//bin/genrb -p xalanMsg -d ../../../nls/icu-i ../../../nls/icu ../../../nls/icu/en_US.txt echo ../....

Removing a trailing Space from Regex Matched group

I'm using regular expression lib icucore via RegKit on the iPhone to replace a pattern in a large string. The Pattern i'm looking for looks some thing like this | hello world (P1)| I'm matching this pattern with the following regular expression \|((\w*|.| )+)\((\w\d+)\)\| This transforms the input string into 3 groups when a match...

UnicodeString to char* (UTF-8)

I am using the ICU library in C++ on OS X. All of my strings are UnicodeStrings, but I need to use system calls like fopen, fread and so forth. These functions take const char* or char* as arguments. I have read that OS X supports UTF-8 internally, so that all I need to do is convert my UnicodeString to UTF-8, but I don't know how to do ...

how to make ICU smaller?

In WebKit, it use ICU, but I don't have enough space to contain icudt42.dll. the size of icudt42.dll is about 10.4MB,but I only need Chinese language, Russian language and English language,so how can I make the icudt.dll smaller? ...

android, moving from icu4j to icu4c

Hello, i have class used with android frameworks, it calls icu4j's Arabicshaping. now i'v merged this class with another android branch that uses icu4c ( c implementation). but build process gives me error saying cannot find Arabicshaping... searching in icu4c files shows me that it has both ArabicShaping.c and ushape.c but i don't kno...

Linux Installing Library (ICU) Question

I'm a relative noob to installing libraries. My system currently has an older version of the ICU library (3.8) and I want to go the latest (4.4). Following the steps in the ICU readme.html, everything goes fine (echo $? produces all 0 for every step). And I see the libary was installed to /usr/local/lib. However the current version of t...

C++ encode string to Unicode - ICU library

Hi, I need to convert a bunch of bytes in ISO-2022-JP and ISO-2022-JP-2 (and other variations of ISO-2022) into Unicode. I am trying to use ICU (link text), but the following code doesn't work. std::string input = "\x1B\x28\x4A" "ABC\xA6\xA7"; //the first 3 chars are escape sequence to use JIS_X201 character set in GL/GR UErrorCode...

icu4c--> ushape.c missing character in shaping ?

hello, in our langauge we use arabic characters in writing with some differences, icu's ushape.c ( arabic shaper) only works with main arabic characters and dosn't shape my language specific characters ( i.e 0x6D5 etc) i'v changed ushape.c to work with my language and it worked well except for on character, that is 0x649, in arabic they ...

what languages are supported in icu collation?

I was browsing through the ICU source code (http://icu-project.org/), and I couldn't find what languages it supports out of the box for collation. Could someone help me? ...

ICU Probe All Currency Symbols

Is there a way to probe the ICU library for all UChar's representing currency symbols supported by the library? My current solution is iterating through all locales and for each locale, doing something like this: const DecimalFormatSymbols *formatSymbols = formatter->getDecimalFormatSymbols(); UnicodeString currencySymbol = formatSymbo...

[ICU4C] NumberFormat/DecimalFormat treats certain floating-point values as longs instead of doubles

NumberFormat/DecimalFormat doesn't seem to parse strings with the "#.0" format (where # is any number) as a double. The following code illustrates this: #include <cstdio> #include <iostream> #include <unicode/decimfmt.h> #include <unicode/numfmt.h> #include <unicode/unistr.h> #include <unicode/ustream.h> int main() { UErrorCode sta...

[ICU4C] Set UnicodeString to C string without allocating a new UnicodeString

As of ICU 4.2.1, the only straight-forward way to set a UnicodeString to a C string is to construct a new UnicodeString with the data, and then set the desired string to the new one, thus allocating, copying, and deallocating data more than I'd like. Is there a way to set a UnicodeString to a (null-terminated/length) C string without ha...