views:

101

answers:

1

I am trying to read in the body of a certain webpage to display on a seperate webpage, but I am having a bit of trouble with it. Right now, I use the following code

<?php
@$doc = new DOMDocument();
@$doc->loadHTMLFile('http://foo.com');
@$tags = $doc->getElementsByTagName('body');
foreach ($tags as $tag) {
    $index_text .= $tag->nodeValue;
    print nl2br($tag->nodeValue).'<br />';
}
?>

This code works, however it seems to remove alot of formatting, which is important to me, such as line breaks. How do I stop that from happening

+1  A: 

The formatOutput attribute of a DOMDocument will do this.

$doc->formatOutput = true;

This will cause the DOM output to be output more for human consumption, with line breaks where you'd need them and indentation i.e. 'pretty print'.

The default value for this value is false, so you have to explicitly set it to true when needed.

Jon Cram