tags:

views:

125

answers:

4

I'm trying to parse an XML file using PHP, but I get an error message:

parser error : Char 0x0 out of allowed range in

I think it's because of the content of the XML, I think there is a speical symbol "☆", any ideas what I can do to fix it?

I also get:

parser error : Premature end of data in tag item line

What might be causing that error?

I'm using simplexml_load_file.

Update:

I try to find the error line and paste its content as single xml file and it can work!! so I still cannot figure out what makes xml file parse fails. PS it's a huge xml file over 100M, will it makes parse error?

A: 

Do you have control over the XML? If so, ensure the data is enclosed in <![CDATA[ .. ]]> blocks.

Jhong
I can't control XML but I can ask...but it's the solution?? let me check it
Matthew Wilson
Yes, I agree. Dominic has the solution.
Jhong
user315396: Sorry but no way you've fixed "out of allowed range" with a CData section.
chendral
A: 

If you have control over the data, ensure that it is encoded correctly (i.e. is in the encoding that you promised in the xml tag, e.g. if you have:

<?xml version="1.0" encoding="UTF-8"?>

then you'll need to ensure your data is in UTF-8.

If you don't have control over the data, yell at those who do.

Dominic Rodger
I try to find the error line and paste its content as single xml file and it can work!! so I still cannot figure out what makes xml file parse fails.
In this case, this exactly reinforces what Dominic is saying.
Jhong
Ok...I think some data is not UTF-8..acutually if I open XML at FF, there is a error msg to mean error char. IE...hm...it's big file..I just wait for long time but no response.
Use xmllint (http://www.xmlsoft.org/xmllint.html) for checking big files.
Dominic Rodger
+2  A: 

Not all valid utf-8 characters are allowed in XML documents. E.g. for xml 1.0 the standard says:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
which doesn't include 0x0.

You can skip those characters e.g. with a (read) filter that is applied to the input stream for simplexml_load_file, see php://filter at http://docs.php.net/wrappers.php

<?php
stream_filter_register('xmlutf8', 'ValidUTF8XMLFilter');

// file_put_contents('test.xml', '<a>foo'.chr(0).'</a>');
// $doc = simplexml_load_file('test.xml'); => Char 0x0 out of allowed range

$doc = simplexml_load_file("php://filter/read=xmlutf8/resource=test.xml");
echo $doc->asXML();

class ValidUTF8XMLFilter extends php_user_filter {
  protected static $pattern = '/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u';
  function filter($in, $out, &$consumed, $closing)
  {
    while ($bucket = stream_bucket_make_writeable($in)) {
      $bucket->data = preg_replace(self::$pattern, '', $bucket->data);
      $consumed += $bucket->datalen;
      stream_bucket_append($out, $bucket);
    }
    return PSFS_PASS_ON;
  }
}
VolkerK
VolkerK, I try it but same error. and another error happend:Fatal error: Call to a member function asXML() on a non-object. I replace the load file resource and add register and beginning and add class at the end.
A: 

Make sure your XML source is valid. See http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

stillstanding
XML is valid what if encoding is UTF-8 but there is a Big5 char , I find the char "".