ansaurus

Question

How to output unicode string to RTF (using C#)

Answer 1

A:

You will have to convert the string to a byte[] array (using Encoding.Unicode.GetBytes(string)), then loop through that array and prepend a \ and u character to all Unicode characters you find. When you then convert the array back to a string, you'd have to leave the Unicode characters as numbers.

For example, if your array looks like this:

byte[] unicodeData = new byte[] { 0x15, 0x76 };

it would become:

// 5c = \, 75 = u
byte[] unicodeData = new byte[] { 0x5c, 0x75, 0x15, 0x76 };

Ian Kemp 2009-09-02 14:38:46

Hi, thank you for response, I've tried to implement your solution, unfortunately it's not working.I think that's because there is difference between Codepoint and UTF16 encoding (Encoding.Unicode)You are suggesting me to output bytes from UTF16 encoding where Codepoint is excepted. (And this is working for many characters, but not all)

Emir 2009-09-02 15:18:53

This answer also seams to work, I probably had a bug in my code when I was testing it.Thank you for your answer and your time

Emir 2009-09-03 11:06:32

Answer 2

+3 A:

Provided that all the characters that you're catering for exist in the Basic Multilingual Plane (it's unlikely that you'll need anything more), then a simple UTF-16 encoding should suffice.

Wikipedia:

All possible code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800–U+DFFF (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or future character assignment or use.

The following sample program illustrates doing something along the lines of what you want:

static void Main(string[] args)
{
    // ë
    char[] ca = Encoding.Unicode.GetChars(new byte[] { 0xeb, 0x00 });
    var sw = new StreamWriter(@"c:/helloworld.rtf");
    sw.WriteLine(@"{\rtf
{\fonttbl {\f0 Times New Roman;}}
\f0\fs60 H" + GetRtfUnicodeEscapedString(new String(ca)) + @"llo, World!
}"); 
    sw.Close();
}

static string GetRtfUnicodeEscapedString(string s)
{
    var sb = new StringBuilder();
    foreach (var c in s)
    {
     if (c <= 0x7f)
      sb.Append(c);
     else
      sb.Append("\\u" + Convert.ToUInt32(c) + "?");
    }
    return sb.ToString();
}

The important bit is the Convert.ToUInt32(c) which essentially returns the code point value for the character in question. The RTF escape for unicode requires a decimal unicode value. The System.Text.Encoding.Unicode encoding corresponds to UTF-16 as per the MSDN documentation.

Eric Smith 2009-09-02 14:39:12

hmmmm, very interesting point. If that's true, than, there is probably mistake somewhere in my logic... and Ian Kemp's answer makes much more sense... I'll keep googling

Emir 2009-09-02 15:21:57

Thank you for example, it works!

Emir 2009-09-03 10:51:56

ansaurus

tags:

views:

answers:

How to output unicode string to RTF (using C#)

related questions