XML Exception: Invalid Character(s)

A:

If your input is not XML, you should use something like Tidy or Tagsoup to clean the mess up.

They would take any input and try, hopefully, to make a useful DOM from it.

I don't know how relevant dark side libraries are called.

alamar 2009-05-12 19:10:23

+3 A:

Would something as described in this blog post be helpful?

Basically, he creates a sanitizing xml stream.

Richard Morgan 2009-05-12 19:13:09

Actually, he's processing a XML all at once, as a string.

Matthew Flaschen 2009-05-12 19:18:45

@Matthew, yeah, that's the example where he calls .ReadToEnd(), but you could just use .Read(), etc. My guess is the OP will need to do what you said.

Richard Morgan 2009-05-12 19:25:24

That link was extremely useful

Meiscooldude 2009-05-12 19:39:44

I just noticed the XmlSanitizingStream towards the bottom of the blog post. My mistake.

Matthew Flaschen 2009-05-13 00:56:26

+1 A:

Garbage In, Garbage Out. If the remote application is sending you garbage, then that's all you'll get. If they think they're sending XML, then they need to be fixed. In this case, you're not doing them any favors by working around their bug.

You should also make sure of what they think they're sending. What did the %1C mean to them? What did they want it to be?

John Saunders 2009-05-12 19:15:08

I wish I was in a position to fix their bug, but I'm not... The bug comes from unfiltered user input... Some users decide to put some super weird characters in there... and it accepts it...

Meiscooldude 2009-05-13 19:50:12

My recommendation would be to reject the garbage, then produce a report showing what got rejected. Then send that report to the owner of the buggy code, at least once per month.

John Saunders 2009-05-13 22:25:13

A:

IMHO the best solution would be to modify the code/program/whatever produced the invalid XML that is being fed to your program. Unfortunately this is not always possible. In this case you need to escape all characters < 0x20 before trying to load the document.

Darin Dimitrov 2009-05-12 19:15:48

A:

If you really can't fix the source XML data, consider taking an approach like I described in this answer. Basically, you create a TextReader subclass (e.g StripTextReader) that wraps an existing TextReader (tr) and discards invalid characters.

Matthew Flaschen 2009-05-12 19:20:49

Your answer implies that the characters really are garbage. That all he needs to do is discard them. I suggested he should first find out what those characters are meant to be.

John Saunders 2009-05-12 19:23:46

+3 A:

XML can handle just about any character, but there are ranges, control codes and such, that it won't.

Your best bet, if you can't get them to fix their output, is to sanitize the raw data you're receiving. You need replace illegal characters with the character reference format you noted.

(You can't even resort to CDATA, as there is no way to escape these characters there.)

great_llama 2009-05-12 19:26:33

ansaurus

tags:

views:

answers:

XML Exception: Invalid Character(s)

related questions