tags:

views:

748

answers:

2

Hey all,

I have downloaded the xml dump of the Stack Over Flow site. While transferring the dump into a mysql database I keep running into the following error: Got an Exception: Character reference "some character set like &#x10" is an invalid XML character.

I used UltraEdit (it is a 800 meg file) to remove some characters from the file, but if I remove an invalid charater set and run the parser I get error identifying more invalid characters. Any suggestions on how to solve this?

Cheers all,

j

+1  A: 

The set of characters permitted in XML is here. As you can see, #x10 is not one of them. If these are present in the stackoverflow dump, then it's not XML compliant.

Alternatively, you're reading the XML using the wrong character encoding.

skaffman
+1  A: 
Jon Skeet
I'm using the first fecking dump, I'll get to it with the second one tonight. Thanks for your help.
slotishtype