views:

566

answers:

3

Hello folks,

I'm porting an isapi (pageproducers) application from delphi 7 to delphi 2009, the pages are based on html files in UTF8.

Everything goes well except when Onhtmltag is fired and I replace a transparent tag with any value with special characters like accented characters (áé...) Those characters are replaced in the output with an � character.

What's wrong?

+4  A: 

As part of your debugging procedure, you should go find out exactly what byte value(s) the browser receives for the question-mark character.

As you should know, Delphi 2009's string type is Unicode, whereas all previous version were ANSI. Delphi 7 introduced the Utf8String type, but Delphi 2009 made that type special. If you're not using that type for holding strings that are encoded as UTF-8, then you should start doing so. Values held in Utf8String variables will be converted to UnicodeString values automatically when you assign one to the other.

If you're storing your UTF-8-encoded strings in ordinary AnsiString variables, then they will be converted to Unicode using the default system code page if you assign them to a UnicodeString. That's not what you want.

If you're assigning UTF-8-encoded literals to variables of type string, stop that. That type expects its values to be encoded as UTF-16, just like WideString always has.

If you are loading your files into a TStrings descendant with LoadFromFile, then you need to start using that method's second parameter, which tells it what encoding to use. UTF-8-encoded files should use TEncoding.UTF8. The default is TEncoding.Unicode, which is little-endian UTF-16.

Rob Kennedy
+1, wish I could give +2. Compact and informative.
Argalatyr
Thanks Rob, the last paragraph is exactly the solution for my problem.
Francis Lee
Actually, it was Delphi 6 that introduced UTF8String.
Remy Lebeau - TeamB
A: 

This is probably a character encoding issue.

The Delphi IDE usually uses Windows-1252 or UTF-16 to encode source code. HTML often uses UTF-8.

You probably need some transliteration between those encodings. For that you need to find out what encodings are used exactly (like Rob mentions).

Or revert to HTML escaping accented characters (like Ralph mentions)

Can you post a small app that shows the problem? (you can email me, about anything that has jeroen in the username and pluimers.com in the domain name will arrive in my mailbox).

--jeroen

Jeroen Pluimers
A: 

Hello again,

Thank you for your help, after some test the problem was very very simple (or stupid also)

response.contenttype := 'text/html charset=UTF-8'

No need to translate manually between unicodestring utf8string ansistring widestring. Delphi 2009 string usage is near to perfect.

Francis Lee