tags:

views:

183

answers:

1

I've got a WinForms RichTextBox in my application. When I enter the Chinese text "蜜蜜蜜蜜", the control uses the following RTF:

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fmodern\fprq6\fcharset134 SimSun;}{\f1\fnil\fcharset0 Microsoft Sans Serif;}} \viewkind4\uc1\pard\f0\fs17\'c3\'db\'c3\'db\'c3\'db\'c3\'db\f1\par }

The test string is the same character four times. It's Unicode value is 34588 (0x871C). So how is it that the character is being stored as "\'c3\'db" in the RTF? What kind of encoding is that?

+1  A: 

RTF is old, older than Job and considerably predates Unicode. I think it using code page 936, a double-byte character set for Simplified Chinese. Your snippet shows it using c3db for the character, it matches the glyph shown in this table.

Hans Passant