views:

516

answers:

4

I'm writing a function using ICU to parse an Unicode string which consists of kanji numeric character(s) and want to return the integer value of the string.

"五" => 5
"三十一" => 31
"五千九百七十二" => 5972

I'm setting the locale to Locale::getJapan() and using the NumberFormat::parse() to parse the character string. However, whenever I pass it any Kanji characters, the parse() method is returning U_INVALID_FORMAT_ERROR.

Does anyone know if ICU supports Kanji character strings in the NumberFormat::parse() method? I was hoping that since I'm setting the Locale to Japanese that it would be able to parse Kanji numeric values.

Thanks!

#include <iostream>
#include <unicode/numfmt.h>

using namespace std;

int main(int argc, char **argv) {
    const Locale &jaLocale = Locale::getJapan();
    UErrorCode status = U_ZERO_ERROR;
    NumberFormat *nf = NumberFormat::createInstance(jaLocale, status);

    UChar number[] = {0x4E94}; // Character for '5' in Japanese '五'
    UnicodeString numStr(number);
    Formattable formattable;
    nf->parse(numStr, formattable, status);
    if (U_FAILURE(status)) {
        cout << "error parsing as number: " << u_errorName(status) << endl;
        return(1);
    }
    cout << "long value: " << formattable.getLong() << endl;
}
+1  A: 

I was inspired by your question to solve this problem using Python.

If you don't find a C++ solution, it shouldn't be too hard to adapt this to C++.

Ryan Ginstrom
+2  A: 
blackkettle
+2  A: 

You can use the ICU Rule Based Number Format (RBNF) module rbnf.h (C++) or for C, in unum.h with the UNUM_SPELLOUT option, both with the "ja" locale for Japanese.

Steven R. Loomis
This is the correct answer: instread: `NumberFormat::createInstance(jaLocale, status);` use `new RuleBasedNumberFormat(URBNF_SPELLOUT,jaLocale, status);`
Artyom
A: 

This is actually quite difficult, especially if you start looking at the obsucre kanji for very large numbers.

In perl, there is a very complete implementaion in Lingua::JA::Numbers. It's source might be inspirational if you want to port it to C++.

Gavin Brock