tags:

views:

70

answers:

4

Hi. Maybe I am missing something... but the DOM Object is empty in this code:

$input = file_get_contents('http://www.google.com/');
$doc = new DOMDocument();
@$doc->loadHTML($input); //supress errors on invalid html!
var_dump($doc);
die();

I really don't know what could be wrong with that code. I have verified that $input is actually filled with the html contents of the web page.

The output is: object(DOMDocument)#3 (0) { }

I don't understand why...

A: 

If we remove @, then warnings from loadHTML due to invalid syntax of the html document to load . It may be because of few html tags causing problems.

try to replace

$doc->loadHTML($input);

with

$doc->loadHTML(htmlentities($input));

Hope this helps.

Yogesh
" LoadHTMLfile: The function parses the HTML document in the file named filename. " File_get_contents loads the webpage into a string. So LoadHTML is the correct function to use.
reggie
@reggie : i have edited the answer, please check now.
Yogesh
I'm sorry, but that's just bogus. It would remove all the html tags from the string and would not give the DOM anything to work on.
reggie
A: 
JapanPro
If you add "var_dump($doc);" to your code, you'll see that $doc is still an empty object. At least it is for me!
reggie
The error supression is not the problem. The php manual specifically states that a string can be loaded even it is not valid.
reggie
+1  A: 

This is expected behaviour. To see the HTML, use DOMDocument::saveHTML() (or saveXML()).

salathe
Thanks, I see you are right. So why can't I view the dom document object's contents? Is shows the behavior of a resource?
reggie
+1  A: 

The output is: object(DOMDocument)#3 (0) { }

Yes. That's what a var_dumped DOMDocument looks like.

If you want to look at the HTML representation of the content inside the document, saveHTML() on it. That spits out a cleaned up version of the HTML on Google's home page for me.

bobince