views:

148

answers:

3

Seems like my data is getting corrupted when using HTTPapp.HTMLEncode( string ): String;

HTMLEncode( 'Jo&hn D<oe' ); // returns 'Jo&am'

This is not correct, and is corrupting my data. Does anyone have suggestions for VCL components that work better? Other than spending my time encoding all the cases

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

Update

After understanding more about HTML, I have found there is no need to encode the other characters referenced in my link. You would only need to know about the four HTML reserved characters being

&,<,>,"

The issue with the VCL HTTPApp.HTMLEncode( ) function is because of the buffer size and the new Delphi 2009/2010 specifications for default Unicode string types, this can be fixed the way that @mason says below, or it can be fixed with a call to WideFormatBuf( ) instead of the FormatBuf( ) that is currently in use.

+1  A: 

You're probably using Delphi 2009 or 2010. It looks to me like they forgot to update HTMLEncode for Unicode. It's passing the wrong buffer lengths to FormatBuf.

The HTMLEncode routine is basically right, aside from that, and it's pretty short. You could probably just make your own copy. Everywhere it calls FormatBuf, it gives 5 parameters. The second and fourth are integer values. Double both of them in each call, (there are only four of them), and then it will work.

Also, you ought to open a QC report on this so it will get fixed.

Mason Wheeler
I see in their code the only characters they look for are Convert = ['Should this be extended to accommodate all the HTML special characters? The same issue could be fixed with a call to WideFormatBuf as well, but my concern is the plethera of extra characters that could be considered special, Math signs, international things, etc.
wfoster
+4  A: 

Replacing the <, >, &, and " characters in a string is trivial. You could thus easily write your own routine for this. (And if your HTML page is UTF-8, there is absolutely no reason to encode any other characters, such as U+222B (the integral sign).)

But if you wish to stick to the Delphi RTL, then you can have a look at HTTPUtil.HTMLEscape with the exactly same signature as HTTPApp.HTMLEncode.

Or, have a look at this SO question.

Andreas Rejbrand
Stijn Sanders
+1  A: 

Small hint: do not convert single quote (') to &apos; - some browsers do not understand this code because &apos; is not valid HTML

For details, see: "The Curse of &apos;" and "XHTML and '"

(Both Delphi units mentioned do not convert single quotes).

mjustin