ansaurus

Question

How do I preserve special characters when writing XML with XDocument.Save()?

Answer 1

+2 A:

I strongly suspect you won't be able to do this. Fundamentally, the copyright sign is © - they're different representations of the same thing, and I expect that the in-memory representation normalizes this.

What are you doing with the XML afterwards? Any sane application processing the resulting XML should be fine with it.

You may be able to persuade it to use the entity reference if you explicitly encode it with ASCII... but I'm not sure.

EDIT: You can definitely make it use a different encoding. You just need a StringWriter which reports that its "native" encoding is UTF-8. Here's a simple class you can use for that:

public class Utf8StringWriter : StringWriter
{
    public override Encoding Encoding
    {
         get { return Encoding.UTF8; }
    }
}

You could try changing it to use Encoding.ASCII as well and see what that does to the copyright sign...

Jon Skeet 2010-08-18 13:57:33

I'm writing a tool that runs through an xml file and adds attributes that, according to a set of business rules that I have, are missing or invalid. Then, I want to spit the new xml back out. I do not have control over the system that ultimately reads the xml, so I wanted my own footprint to be the absolute minimum. I do not know if this other system is sane. I want to assume that it isn't.

Chris 2010-08-18 14:08:58

Added more information beside "Update"

Chris 2010-08-18 14:10:52

@Chris: Edited due to the update. I wouldn't immediately assume that the reading system is badly written unless you have reason to. Can you not try the existing output first? It really should be equivalent.

Jon Skeet 2010-08-18 14:14:53

Your Utf8StringWriter fixed the declaration output. Thank you! I am still trying to find a way to preserve the special characters, though. I did 2 more updates showing some others that seem really wrong, whereas the copyright symbol seems like it could be right.

Chris 2010-08-18 14:23:51

I tried ASCII and every other encoding I could think of and still had no luck.

Chris 2010-08-18 14:24:38

@Chris: I suspect the problem you're seeing in terms of other characters is that you're opening the file with a text editor which is assuming a different character encoding. What happens if you open it in a genuine XML editor?

Jon Skeet 2010-08-18 14:26:39

Good idea. If I open the output xml in XMLSpy, it pops up a dialog that says "Your file contains 1 character(s) that should not be present in a file using the Unicode UTF-8 encoding... The offending characters are `ÿ (0xFF)`. It is referring to what gets swapped in for that ` `

Chris 2010-08-18 14:31:52

@Chris: Hmm. What does that bit of the file look like in binary, out of interest?

Jon Skeet 2010-08-18 14:33:15

I used the class you specified to get the declaration to be accurate. Then I did a crazy hack to preserve all special characters, using regular expressions.

Chris 2010-08-20 05:41:00

Answer 2

A:

Maybe you can try to diffent document encoding, check out: http://www.sagehill.net/docbookxsl/CharEncoding.html

Ivo 2010-08-18 14:02:39

Answer 3

A:

kbrimington 2010-08-18 14:14:32

There are bunches of other ones, `¶` and ` ` to name 2 more.

Chris 2010-08-18 14:19:39

ansaurus

tags:

views:

answers:

How do I preserve special characters when writing XML with XDocument.Save()?

related questions