ansaurus

Question

Problem with XML encoding of database contents with Latin characters

Answer 1

+1 A:

How do you know the XML is UTF-8 encoded? I don't know the MS environment well, but in Java a common problem is to assume that just writing the encoding="UTF-8" header causes it to be UTF-8 encoded. You also have to configure the writer to actually write UTF-8.

You said Wireshark shows hex D6, which would indicate the stream is actually NOT UTF-8 encoded, regardless of what the header says.

Jim Garrison 2010-05-28 20:52:51

Answer 2

A:

Well, I'm not entirely sure why, but I was able to get it working. Prompted by Jim's comments I changed the XML and response encoding back from 8859-1 to UTF-8, and also the encoding in the META tag for the pages.

It now works without complaint in IE, and the browsers now display the correct characters.

I also checked the raw bytes with Wireshark this time and the "Ö" character is being encoded in the XML as 2 bytes (0xC3, 0x96), instead of 1 byte of 0xD6.

So in summary:

In the server-side ASP code to generate the XML response header:

return ("<?xml version=\"1.0\" encoding=\"UTF-8\"?>") ;

In the server-side ASP code to generate the response itself:

Response.ContentType = "text/xml; charset=UTF-8" ;
Response.Write (XMLResponse) ;

and in the web page header:

<head>
  <meta http-equiv="Content-type" content="text/html; charset=UTF-8">

Many thanks Jim.

2010-05-29 01:01:06

ansaurus

tags:

views:

answers:

Problem with XML encoding of database contents with Latin characters

related questions