ansaurus

Question

Where does glibc get its database of unicode attributes?

Answer 1

A:

I believe that it's defined in the locale definition file. See this page for more information about locales. glibc includes a bunch of locale definitions in localedate/locales, although none of them seem to have any width information.

Adam Rosenfield 2009-05-05 02:44:59

However locales are generated by the localedef application which... comes with glibc. I'm more interested in finding the canonical location to edit this information...

bdonlan 2009-05-05 02:48:48

Answer 2

+1 A:

Okay, so I'm just poking around myself so I'm not absolutely sure, but it appears that the table you are looking for is found in the following location relative to the glibc root:

localedata/locales/i18n

This appears to be the Unicode (version 5) locale. It contains the following, which is where I believe you need to make your changes:

% ENCLOSED ALPHANUMERICS/
   <U24D0>..<U24E9>;/

In case you're wondering, the function ctype_output (ld-ctype.c) calls allocate_arrays which calls wcwidth_table_init. The function wcwidth_table_init is generated by 3level.h (which also generates other tables that follow the same template). This is the chain that I followed to track down the files in localedate/locales.

Like I said, I'm not 100% sure that this is the right table, but I thought I'd share what I had found.

Naaff 2009-05-05 03:11:33

The comments in that file suggest it's generated by localedata/gen-unicode-ctype.c, which talks about a UnicodeData file, but where is the UnicodeData file that's used in the glibc distribution...? I don't want to patch a generated file, it seems like that'd get sticky the next time there's a new release.

bdonlan 2009-05-05 05:57:46

Hmmm... that's a good point. Have you tried modifying the generated file anyway, just to verify that wcwidth() returns the correct values? This might be useful as it would prove that we're on the right path. Then we could put more effort into finding out how the files are generated so the problem can be fixed at its root.

Naaff 2009-05-05 14:58:47

Answer 3

A:

I believe it is explained somewhere around there

dmityugov 2009-05-05 12:22:46

Answer 4

A:

It looks like the data is generated by the (apparently manually-run) localedata/gen-unicode-ctype.c from the unicode datafiles published at http://unicode.org/Public/UNIDATA/ . Thanks to Naaff for pointing me in the right direction!

bdonlan 2009-05-06 04:27:43

ansaurus

tags:

views:

answers:

Where does glibc get its database of unicode attributes?

related questions