views:

325

answers:

1

Hello,

I am loading text from text file to richedit but it displays weird Chinese symbols instead, what am I doing wrong?

ms := TMemoryStream.Create;
ms.LoadFromFile('C:\aw.txt');
ms.Seek(0, soFromEnd);
zChar:=#0;
ms.Write(zChar, 1);
ms.Seek(0, soFromBeginning);
RichEdit1.SetSelTextBuf(ms.Memory);
ms.free;
+3  A: 

Edit Revising my answer due to the comments on question, especially the hint to Delphi 7.

Richedit is based on richedit.dll, which comes from MS and is packaged with Windows. After Windows ME, it is UNICODE enabled. Thus it gets the character set interpreting the first 2 characters of the file as BOF. There are instances that characters will be missinterpreted and taken as a BOF in ASCII or ANSI files (they will not feature a BOF for compatibility reasons). This can be seen in write.exe too.

Make sure you use the right encoding when saving the file in notepad. If the file does not have an encoding (look at the first two bytes in a binary viewer), try - if possible - to add two spaces to the front and see whether the problem persists.

Delphi 2009 and 2010

I will leave my first answer in to help people when upgrading to Delphi 2009 and up:

I would actually say that the text file does not have an encoding but is pure ASCII or ANSI and you are using Delphi 2009 or 2010, which is UNICODE enabled. The first two characters will be taken as BOF (which tells the program which UNICODE encoding is used). If this happens to be a correct BOF, maybe the wrong encoding will be applied.

TMemoryStream does not allow enforcement of encoding.

If possible you can use TStrings, that has a new TEncoding parameter in the LoadFromFile method. This would be like

    RichEdit1.Lines.LoadFromFile('c:\test.txt', TEncoding.ASCII);

Have a look at this page as well: http://edn.embarcadero.com/article/38693

Ralph Rickenbach
It's more than just the first two characters if there's no mark there. Thus the popular gag of saving "Bush hid the files" in notepad and when you read it back you get gibberish Chinese. (The words actually don't matter, it's the lengths of the words.)
Loren Pechtel
See also "The Old New Thing": "The Notepad file encoding problem, redux" - http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspxBasically it is a problem you cannot fix.
Jeroen Pluimers