views:

2187

answers:

1

I'm having a bit of bother URL encoding a string of UTF-8 encoded text to pass over HTTP. I am using Server.URlEncode in classic ASP (vbscript) to do the encoding on the "é" character.

It produces the following string,

%C3%83%C2%A9

The system I am talking to over HTTP is PHP however and it cannot decode this string. Using a PHP encoder at http://www.albionresearch.com/misc/urlencode.php the same character encoded using the PHP URL encoding method comes out as,

%E9

Does anyone know how I can successfully encode my UTF-8 encoded strings in my ASP so that the PHP system can successfully decode them?

+2  A: 

%C3%83%C2%A9

That's encoded too much: the string is written as UTF-8, read back in as ISO-8859-1, then written as UTF-8 again before being hex-encoded!

%E9

That's encoded too little: the string is written out as plain ISO-8859-1 and hex-encoded. This is fine if the PHP script you are talking to is expecting ISO-8859-1, but modern web systems should be talking UTF-8, in which case the sequence you want it to look like:

%C3%A9

(That's encoded just right!)

I am using Server.URLEncode in classic ASP

Classic ASP has, unfortunately, some serious deficiencies in processing Unicode. You can set @ CODEPAGE=65001 (and Response.Charset="UTF-8") to produce UTF-8 pages, but your internal string type is still encoded in the system codepage, and any data grabbed from form submissions or the database will be read into that encoding.

So you can URLEncode() a literal chr 233 to get the correct output OK, but if you're getting the data from a UTF-8-encoded ‘é’ in a form submission, you'll end up with ‘é’ — the UTF-8 sequence misinterpreted as ISO-8859-1 (actually cp1252, the Windows equivalent).

bobince