tags:

views:

854

answers:

2

On this data:

<row Id="37501" PostId="135577" Text="...uses though.&#x10;"/>

I'm getting an error with the Python sax parser:

xml.sax._exceptions.SAXParseException:
comments.xml:29776:332: reference to invalid character number

I trimmed the example; 332 points to "&#x10;".

Is the parser correct in rejecting this character?

+2  A: 

&#10; is the linefeed character, which seems to be the intent.

&#x10; would be the same as &#16; (10 hex is 16 decimal) and would refer to the DLE (data link escape) character.

DLE is a transmission control character used to control the interpretation of data being transmitted.

Naaff
+9  A: 

As others have stated, you probably meant &#10;. The reason why &#x10; (0x10 = 10h = 16) is invalid is that it's explicitly excluded by the XML 1.0 standard: (http://www.w3.org/TR/xml/#NT-Char)

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Roger Pate
If your libraries support it, XML 1.1 has a wider range of legal characters - http://www.w3.org/TR/xml11/#charsets
Lachlan Roche
Thanks Lachlan. Edited to make XML 1.0 explicit.
Roger Pate
Yep, I double-checked the header, it's version="1.0". thanks!
Mark Harrison