views:

53

answers:

1

Basically I want to write php code that lists all the contents that are between <h1> tags from external url.

I don't want just the first but all of them. So if the source of the external website is

<html>
  <title></title>
  <head></head>
  <h1>Test Here</h1>
  <h1>Test here</h1>
</html>

I want to make a script that generates only the content between the <h1> tags that would be:

Test Here
Test here

I'm familiar with PHP but I just cant think of scripts that do that.

+3  A: 

simple_html_dom is your friend.

$dom = file_get_html("http://yourserver.com/path/to/file.html");
// alternatively use str_get_html($html) if you have the html string already...

foreach ($dom->find("h1") as $node)
{
    echo $node->innertext;
}

It is very powerful and can do much, much more.

Byron Whitlock
Yeah i also would recommend using simple_html_dom because writing regex is more complicated
streetparade
@Byron Haha.. Well your example certainly looks simpler...
Peter Ajtai
+1 For html parser not regex
Pete