tags:

views:

500

answers:

5

I have an HTML page containing alot of meta tags and I want to parse them to find certain ones. Here is the code I am using, but it's not picking up any of the tags.

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHtml($contents);
$metaChildren = $dom->getElementsByTagName('meta');
var_dump($metaChildren);

Here is a snippet of the HTML I am using (I replaced the arrow with a brace):

[meta name="GZPlatform" content=" pc"]
[meta name="GZFeatured" content=" Gone Gold"]
[meta name="GZHeadline" content=" pc"]
[meta name="GZP_ID" content=" pc 21153"]

Any Ideas?

A: 

My guess would be that the HTML is not valid and that the $dom->loadHtml call is failing. I believe that call returns true|false. So maybe something like this:

if($dom->loadHtml($contents)){
    $metaChildren = $dom->getElementsByTagName('meta');
}else{
    //handle properly
}
jaywon
I didn't realize that it had to be valid html
Jonathan Kushner
It loaded just fine
Jonathan Kushner
actually, looked at the documentation, you are right. i'm used to using XML parsers. in either case though, it's a good idea to check that the load was successful before continuing.
jaywon
A: 

Use the PHP Simple DOM Library

Jonathan Kushner
A: 

Are you sure the tags aren't being matched? What is the output of var_dump? What value do you get when you use var_dump($metaChildren->length)? Your code seems to work here:

<?
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHtmlFile('test.html');
$metaChildren = $dom->getElementsByTagName('meta');
for ($i = 0; $i < $metaChildren->length; $i++) {
  $el = $metaChildren->item($i);
  print $el->getAttribute('name') . '=' . $el->getAttribute('content') . "\n";
}
?>

Gives output:

GZPlatform= pc
GZFeatured= Gone Gold
GZHeadline= pc
GZP_ID= pc 21153
John
A: 

COuld it be that the parser expects you to close the meta tags?

<meta name="name" />

or

<meta name="name"></meta>
Pekka
A: 

I think I remember getting unexpected results with var_dump on DOM objects. See John's answer.

David W