tags:

views:

35

answers:

2

In working through the issues with another question, I've found text files with embedded Ctrl-S characters (decimal 19) in them. When adding their text to an XML document, an exception is thrown (C#/.NET).

According to this page, as I read it, they are not in the rages for a "Char" and so are illegal: http://www.w3.org/TR/REC-xml/#charsets

However, a cohort found the an XML specification, and implies that it is equivalent to a carriage return (decimal 13, not hex 0x13), here: http://www.w3.org/1999/07/WD-xml-c14n-19990729#charescaping

But then, the paragraph in question is not in the more recent version (http://www.w3.org/TR/xml-c14n) which explicitly states:

  • In character data, the carriage-return (#xD) character is represented by "
".

So, two quesitons:

  1. Am I missing something, or is there a typo on the W3C page -- an "x" in the token  where it should be 
 or 
 ??
  2. When a specification has an error (not just something that changed but an actual error), does the W3C leave the document accessible? Seems like that's a "yes"
A: 

You have a single character which contains (decimal) 19, right? That XML spec is talking about character escapes. If that character were legal in XML, it could be escaped as  or as . But it wasn't. And even if it were escaped, the escaped version would not be legal either.

Paul Clapham
Notice this sentence in the document at the second link: **Where a document contains the string "", the information set contains a single CR (#xD) character.**
NVRAM
My question is about the older documents published by W3C: *does the old one have a (blatant) error, and if so why do they not correct that?*
NVRAM
Why are you drawing our attention to that sentence? Your document doesn't contain that string, according to what you originally said. It contains a single decimal-19 character. So it's irrelevant. I have no idea whether the W3C document has a blatant error, and anyway that's irrelevant to your question too. You should post a separate question if you want to follow that up. My guess would be that it doesn't, but again it's irrelevant to your question.
Paul Clapham
I'm composing XML from text documents (some of which have ^S). As my first attempt to avoid the exception, and sidestep the penalty of cleaning the data, I set the C# *XmlWriterSettings.CheckCharacters* property false; so the *XmlWriter* did in fact write a **** to the output *XML document*. After that change my cohort found the W3C document with what I believe is an error. As to relevance, please re-read the title of this post; my question is *solely* about the W3C documentation.
NVRAM
I see. You didn't mention that detail until now. As for the W3C documentation, I don't care either.
Paul Clapham
+2  A: 

Sure looks like a typo to me. But a typo in the 1999 Canonical XML working draft doesn't seem like an occasion to get too terribly excited.

It's called a "working draft" for a reason. The difference between a working draft and the published recommendation can be considerable, as anyone who used XSL-WD learned to their dismay. The W3C doesn't fix typos in drafts they've published, they publish new versions. That's something that happens pretty slowly. Very slowly, in the case of Canonical XML, which addresses a problem that the world does not appear in desperate need of solving.

Robert Rossney