views:

621

answers:

2

I have xml where some of the element values are unicode characters. Is it possible to represent this in an ANSI encoding?

E.g.

<?xml version="1.0" encoding="utf-8"?>
<xml>
<value>受</value>
</xml>

to

<?xml version="1.0" encoding="Windows-1252"?>
<xml>
<value>&#27544;</value>
</xml>

I deserialize the XML and then attempt to serialize it using XmlTextWriter specifying the Default encoding (Default is Windows-1252). All the unicode characters end up as question marks. I'm using VS 2008, C# 3.5

+3  A: 

If I understand the question, then yes. You just need a ; after the 27544:

<?xml version="1.0" encoding="Windows-1252"?>
<xml>
<value>&#27544;</value>
</xml>

Or are you wondering how to generate this XML programmatically? If so, what language/environment are you working in?

Blair Conrad
Was a typo on my part. Corrected the example.
Richard Nienaber
+5  A: 

Okay I tested it with the following code:

 string xml = "<?xml version=\"1.0\" encoding=\"utf-8\"?><xml><value>受</value></xml>";

 XmlWriterSettings settings = new XmlWriterSettings { Encoding = Encoding.Default };
 MemoryStream ms = new MemoryStream();
 using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
      XElement.Parse(xml).WriteTo(writer);

 string value = Encoding.Default.GetString(ms.ToArray());

And it correctly escaped the unicode character thus:

<?xml version="1.0" encoding="Windows-1252"?><xml><value>&#x53D7;</value></xml>

I must be doing something wrong somewhere else. Thanks for the help.

Richard Nienaber