tags:

views:

332

answers:

1

I have the following two errors when using XMLReader.

1) Warning: XMLReader::read() [xmlreader.read]: MyXML.xml:43102: parser error : xmlParseEntityRef: no name

2) Warning: XMLReader::read() [xmlreader.read]: ^ in MyXMLReader.php on line 56

Does anyone know what those refer to?

My PHP Code (The XML file is about 100MB so I can't include it):

<?php 

//Assign file names
$XMLFile = 'MyXML.xml';
$CSVFile = 'MyCSV.csv';

//take start time to calculate run-time
$time_start = time();

//Open PHP's XMLReader.  XMLReader opens each element in the XML one by one to keep memory use small.
$xml = new XMLReader(); 
$xml->open($XMLFile, null, 1<<19); 

//Loop through all elements.  Save all text from tags and attributes.
while ($xml->read()) {

    if($xml->nodeType == XMLReader::TEXT) { 
        $row[$xml->name] = $xml->value;
    }

    if($xml->hasAttributes)  {
        while($xml->moveToNextAttribute()) { 
            $row[$xml->name] = $xml->value;
        }
    }
}

//save the titles which should appear in CSV file.  All others will not be included.
$SavedRows = $row;
unset($row);

//Remove unnecessary columns i.e. datasource URLs
$RemoveColumn='xmlns:message, xmlns:common, xmlns:frb, xmlns:xsi, xsi:schemaLocation, xmlns:kf';
$RemoveColumns = explode(',', $RemoveColumn);

foreach($RemoveColumns as $key => $val) {
    $val = trim($val);
    unset($SavedRows[$val]);
}

//initiate all rows which should be included
foreach($SavedRows as $key => $val) {
    $row[$key] = '';
}

//Create csv file
$fp = fopen($CSVFile, 'w');

//Input the column headings as first row
fputcsv($fp, array_keys($row), ',');

// Start 2nd loop through XML.
$xml = new XMLReader(); 
$xml->open($XMLFile, null, 1<<19); 

while ($xml->read()) {

    //Determine if tag is empty (An empty tag will contain data) Non empty tags contain series information.
    $Output = $xml->isEmptyElement;

    //Take data from non empty XML tags
    if($xml->nodeType == XMLReader::TEXT) { 
        if(isset($SavedRows[$xml->name])) {
            $row[$xml->name] = $xml->value;
        }
    }

    //take data from XML tag attributes
    if($xml->hasAttributes)  {
        while($xml->moveToNextAttribute()) { 
            if(isset($SavedRows[$xml->name])) {
                $row[$xml->name] = $xml->value;
            }
        }
    }

    //If tag is empty, assume it is data and write row to file.
    if($Output) {
        fputcsv($fp, array_values($row), ',');
    }

}

//Close file handle
fclose($fp);

//Calculate runtime
$time_end = time();
$time = $time_end - $time_start;

 echo "Complete.  Runtime: $time seconds";

 ?>
+2  A: 
xmlParseEntityRef: no name 

Means you've got bogus unescaped ampersands in the XML file. (Well, “XML”... technically if it ain't well-formed, it ain't XML.)

You'll need to check the file for lone &s (or fix the code that generated it) to escape them to &amp;. According to the error, the first one's on line 43102 of the file (yikes!).

bobince
Is there a way to escape them in the `while ($xml->read())` loop?
Brian
Nope. If you've got ` see http://stackoverflow.com/questions/2049400/php-domdocument-loadxml-with-xml-containing-ampersand-less-greater/2049526#2049526 for an example. However this is not watertight (it may mess up comment/PI/CDATA-section content); it is not a sustainable solution. The faulty program that generated the non-XML needs to be fixed.
bobince
I would fix the XML if I was producing it. Unfortunately it comes from a government agency. It's pretty bad that they are using illegal syntax.
Brian
bobince