Quick version:
Whats the standard (innovative? any?) way of catching and handling errors thrown by XMLReader due to malformed file -- specifically un-escaped characters. Prepossessing with Tidy (etc..) isn't a super appealing option, anyone know of a way to simply skip the offending node and move right along?
Descriptive Version:
We all know that it isn't XML if it isn't properly formed, but lets be honest -- it happens. A client regularly pulls in massive (50-100MB+) xml files which need to be read into mysql. XMLReader is the obvious choice and we've written a wrapper which works well for our needs.
Occasionally, an error occurs and read() fails killing the import - drat! Its almost always an un-escaped character (ex "&") which trips everything up. In most situations we'd just have the client call the data provider and demand they fix their defective file. Unfortunately the data providers aren't always obliging and/or timely. It would be amazing if we could simply catch the error and move right along to the next node.
I've spent quite awhile trying to reading / crack this one and can't find anything worth perusing. Am I missing something obvious?
This SO question seemed promising but its just not yielding any results. Passing the 1 seems like it should ask the Reader to recover, but we're just not seeing any attempt / different error messages, etc.. Here's the relevant code outlining the approach:
$xml->open($file, null, LIBXML_NOERROR | LIBXML_NOWARNING | 1);
I could always preprocess with Tidy, but there must be a better way.
I've considered some more "creative" approaches such as sniffing the next Read() with a try/catch after logic for the present node has completed, but that seems clumsy at best. It also seems like there could be potential in emulating Read() with a custom / wrapper function that helps move through the nodes and incorporates error handling but I have a feeling I'm oversimplifying things.
So to sum it all up: When read() fails, how can I catch the error and move along? Any chance we can see what error is coming (at least the message the XMLReader would have thrown)?
$xml = new XMLReader();
$xml->open($file);
while ($xml->read()) {
}