Hi,
How do you deal with broken data in XML files? For example, if I had
<text>Some &improper; text here.</text>
I'm trying to do:
$doc = new DOMDocument();
$doc->validateOnParse = false;
$doc->formatOutput = false;
$doc->load(...xml');
and it fails miserably, because there's an unknown entity. Note, I can't use CDATA due to the way the software is written. I'm writing a module which reads and writes XML, and sometimes the user inserts improper text.
I've noticed that DOMDocument->loadHTML() nicely encodes everything, but how could I continue from there?