




I'm trying to parse an XML file using PHP, but I get an error message:

parser error : Char 0x0 out of allowed range in

I think it's because of the content of the XML, I think there is a speical symbol "☆", any ideas what I can do to fix it?

I also get:

parser error : Premature end of data in tag item line

What might be causing that error?

I'm using simplexml_load_file.


I try to find the error line and paste its content as single xml file and it can work!! so I still cannot figure out what makes xml file parse fails. PS it's a huge xml file over 100M, will it makes parse error?


Do you have control over the XML? If so, ensure the data is enclosed in <![CDATA[ .. ]]> blocks.

I can't control XML but I can ask...but it's the solution?? let me check it
Matthew Wilson
Yes, I agree. Dominic has the solution.
user315396: Sorry but no way you've fixed "out of allowed range" with a CData section.

If you have control over the data, ensure that it is encoded correctly (i.e. is in the encoding that you promised in the xml tag, e.g. if you have:

<?xml version="1.0" encoding="UTF-8"?>

then you'll need to ensure your data is in UTF-8.

If you don't have control over the data, yell at those who do.

Dominic Rodger
I try to find the error line and paste its content as single xml file and it can work!! so I still cannot figure out what makes xml file parse fails.
In this case, this exactly reinforces what Dominic is saying.
Ok...I think some data is not UTF-8..acutually if I open XML at FF, there is a error msg to mean error char. IE...hm...it's big file..I just wait for long time but no response.
Use xmllint (http://www.xmlsoft.org/xmllint.html) for checking big files.
Dominic Rodger
+2  A: 

Not all valid utf-8 characters are allowed in XML documents. E.g. for xml 1.0 the standard says:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
which doesn't include 0x0.

You can skip those characters e.g. with a (read) filter that is applied to the input stream for simplexml_load_file, see php://filter at http://docs.php.net/wrappers.php

stream_filter_register('xmlutf8', 'ValidUTF8XMLFilter');

// file_put_contents('test.xml', '<a>foo'.chr(0).'</a>');
// $doc = simplexml_load_file('test.xml'); => Char 0x0 out of allowed range

$doc = simplexml_load_file("php://filter/read=xmlutf8/resource=test.xml");
echo $doc->asXML();

class ValidUTF8XMLFilter extends php_user_filter {
  protected static $pattern = '/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u';
  function filter($in, $out, &$consumed, $closing)
    while ($bucket = stream_bucket_make_writeable($in)) {
      $bucket->data = preg_replace(self::$pattern, '', $bucket->data);
      $consumed += $bucket->datalen;
      stream_bucket_append($out, $bucket);
    return PSFS_PASS_ON;
VolkerK, I try it but same error. and another error happend:Fatal error: Call to a member function asXML() on a non-object. I replace the load file resource and add register and beginning and add class at the end.

Make sure your XML source is valid. See http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

XML is valid what if encoding is UTF-8 but there is a Big5 char , I find the char "".