tags:

views:

134

answers:

2

Glib::ustring is supposed to work well with UTF8 but I have a problem when working with Japanese strings.

If you compare those two strings, "わたし" and "ワタシ", using == operator or compare method, it will answer that those two strings are equals.

I don't understand why. How Glib::ustring works ?

The only way I found to get false to the comparison is to compare strings of different sizes. For example "海外わたわ" and "海外わた".

Very strange...

+1  A: 

Glib::ustring::compare uses g_utf8_collate() internally, which compares strings according to the rules of the current locale. Is your locale set to something other than Japanese?

ptomato
You might be right, but as long as I'm using an UTF-8 locale (no matter if Japanese or Italian), I should be able to handle UTF-8 characters.
baol
Lexical comparison is different. It's most likely optimized so that in Latin-alphabet locales, non-Latin characters are all sorted the same and therefore compare the same. It seems a little strange, but if you want to bypass it, get the `c_str` and use `strcmp()` to compare.
ptomato
+1  A: 
#include <iostream>
#include <glibmm/ustring.h>
int main() {
  Glib::ustring s1 = "わたし";
  Glib::ustring s2 = "ワタシ";
  std::cerr << (s1 == s2) << std::endl;
  return 0;
}

Output: 0

EDIT: But I digged a little deeper:

#include <iostream>
#include <glibmm.h>
int main() {
  Glib::ustring s1 = "わたし";
  Glib::ustring s2 = "ワタシ";
  std::cout << (s1 == s1) << std::endl;
  std::cout << (s1 == s2) << std::endl;
  std::locale::global(std::locale(""));
  std::cout << (s1 == s1) << std::endl;
  std::cout << (s1 == s2) << std::endl;
  std::cout << s1 << std::endl;
  std::cout << s2 << std::endl;
  return 0;
}

Output:

1
0
1
1
わたし
ワタシ

And this sounds strange.

baol
See http://www.gnu.org/s/libc/manual/html_node/Standard-Locales.html, which says that the standard locale is "C", and that "" maps to your system locale. I would bet that these days, "" maps to Unicode. See http://unicode.org/faq/collation.html for more information.
Armentage