views:

225

answers:

3

I tried find('meta[http-equiv="Content-type"]') but it failed to retrieve that information.

A: 

The Content-Type is typically part of the http-response headers - not in the body. Where did you get the xml document from?

troelskn
Right from `file_get_html`
A: 

I would go foreach on $this->find('meta'); in case of differently written content-type - I think that browsers aren't in this case case sensitive, while php might be.

Adam Kiss
+1  A: 

SimpleHTMLDom doesn't use quoted string literals in the selector. It's just elem[attr=value]. And the comparison of value seems to be case-sensitive (there may be a way to make it case-insensitive, but that I don't know)*

E.g.

require 'simple_html_dom.php';
$html = file_get_html('http://www.google.com/');
// most likely one one element but foreach doesn't hurt
foreach( $html->find('meta[http-equiv=content-type]') as $ct ) { 
  echo $ct->content, "\n";
}

prints text/html; charset=ISO-8859-1.

*edit: yes, there is a way to perform a case-insensitive match, use *= instead of =

find('meta[http-equiv*=content-type]')

edit2: btw that http-equiv*=content-type thingy would also match <meta http-equiv="haha-no-content-types"... (it only tests if the string is somewhere in the attribute's value). But it's the only case-insensitive function/operator I could find. I guess you can live with it in this case ;-)
edit 3: It uses preg_match('.../i') and the pattern/selector is directly passed to that function. Therefore you could do something like http-equiv*=^content-type$ to match http-equiv="Content-type" but not http-equiv="xyzContent-typeabc". But I don't know if this is a warranted feature.

VolkerK
Thanks,I'll live happily with it!