ansaurus

Question

Answer 1

+2 A:

 is not in the legal character range defined by the XML spec. Alas, my Python skills are pretty rudimentary, so I'm not much help there.

McDowell 2010-06-14 16:13:38

Hm, yes, the specification makes it quite clear. Thank you for the exact reference.

clacke 2010-06-14 16:35:44

Answer 2

+1 A:

 is not a valid XML character. Ideally, you'd be able to get the creator of the file to change their process so that the file was not invalid like this.

If you must accept these files, you could pre-process them to turn &#0 into something else. For example, pick @ as an escape character, turn "@" into "@@", and "" into "@0".

Then as you get the text data from the parser, you can reverse the mapping. This is just an example, you can invent any escaping syntax you like.

Ned Batchelder 2010-06-14 16:23:54

In my particular case, I could just delete them. They are in an irrelevant element of the XML. Feels shaky to use text processing to handle XML though, but since it's not well-formed I guess I have no choice... Using some sort of tag soup parser seems overkill.

clacke 2010-06-14 16:41:45

ansaurus

tags:

views:

answers:

Python + Expat: Error on  entities

related questions

ansaurus

tags:

views:

answers:

Python + Expat: Error on &#0; entities

related questions

Python + Expat: Error on entities