ansaurus

Question

A better way of converting Codepage-1251 in RTF to Unicode

Answer 1

+2 A:

The Charset to codepage table is small enough, and static enough, that I doubt the system provides a function to do it.

To do the actual character translations you can use the SysUtils.TEncoding class or the System.SetCodePage function. Both internally use MultiByteToWideString, which uses OS-provided lookup tables, so you don't need to maintain them.

Using SetCodePage would look something like this:

var
  iStart, iStop: Integer;
  RTF, RawText: AnsiString;
  Text: string;
  CodePage: Word;
begin
   ...
   CodePage := CharSetToCodePage(CharSet);
   RawText := Copy(RTF, iStart, iStop - iStart);
   SetCodePage(RawText, CodePage, False); // Set string codepage to Russian without converting it
   Text := string(RawText); // Automatic conversion from string codepage to Unicode

Craig Peterson 2010-03-15 16:23:35

Thanks! The only thing I hadn't tried was setting the Convert parameter of SetCodePage to False and that proved to be the key.

blue painted 2010-03-15 16:37:25

ansaurus

tags:

views:

answers:

A better way of converting Codepage-1251 in RTF to Unicode

related questions