views:

618

answers:

2

The problem is only happening with one file when I try to do a DocumentDOM/SimpleXML method, so it seems like the issue is with that file. No clue what it could be.

If I do the following:

$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);

print_r($xml);

in Chrome, I get a "Page Unavailable" error. In Firefox, I get nothing.

If I do the same thing but to a "test2.html", I get a print out as expected.

If I try the same thing but doing it this way:

$file = "test1.html";
$data = file_get_contents($file)
$dom = DOMDocument::loadHTML($data);
$xml = simplexml_import_dom($dom);

print_r($xml);

I get the same issue.

If I comment out the print_r line, Chrome goes from the "Page Unavailable" to blank.

I changed the permissions to 777, in case that was an issue, no fix.

I tried simply echoing out the contents of the html, no problem at all.

Any clues as to why a) Chrome would do that, and b) why I'm not getting any usable results?


Update:

If I put in: $file = "test1.html"; $dom = DOMDocument::loadHTMLFile($file); if(!$dom) { echo "No Load!"; } else { $xml = simplexml_import_dom($dom); print_r($xml); }

I get the same issue. If I put in:

$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
if(!$dom) {
    echo "No Load!";
}
else {
    echo "Load!";
}

I get the "Load!" output, meaning that the dom method shouldn't be the problem (?)

I'll try the same exact test with the simplexml.


Update2:

If I do this:

I get the same issue. If I put in:

$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
    echo "No Load!";
}
else {
    echo "Load!";
}

I get "Load!" but if I do:

$file = "test1.html";
$dom = DOMDocument::loadHTMLFile($file);
$xml = simplexml_import_dom($dom);
if(!$xml) {
    echo "No Load!";
}
else {
    echo "Load!";
    print_r($xml);
}

I get the error. I did finally notice that I had an option to view the error in Chrome:

 Error 324 (net::ERR_EMPTY_RESPONSE): Unknown error.

The troublesome html file is 288Kb. Could that be the issue? If so, how would I adjust for that?


Last Update:

Very Odd. I can use methods and functions on the object (as simplexml or domdocument), so I can do things like xpath to delete or parse the html, etc. In some cases (small results) it can echo out results, but for big stuff (show all spans), it fails in the same way.

So, since the end result, I think will fit in these parameters, I SHOULD be okay (I guess).

But any real solution is very welcome.

+1  A: 
  • Turn on error reporting: error_reporting(E_ALL); in the first line of your PHP code.
  • Check the memory limit of your PHP configuration: memory_limit in the respective php.ini
  • What's the difference between test1.html and test2.html? Perhaps test1.html is not well-formed.
Stefan Gehrig
A: 

DocumentDOM and/or SimpleXML may bail out if the document is malformed. Try something like:

$dom = DOMDocument::loadHTMLFile($file);
if (!$dom) {
    echo 'Loading file failed';
    exit;
}

$xml = simplexml_import_dom($dom);
if (!$xml) {
    ...
}

If creating the $dom worked, conversion to $xml should work as well, but make sure anyway.

Edit: As Gehrig said, make sure error reporting is on, that should make it obvious where the process fails.

deceze
Error reporting is on. Using conditionals for either the $dom or $xml show that they both load (see updates), but still get nothing on print_r. Is the 288K file size the issue?
Anthony