views:

341

answers:

1

Hi am using simple_html_dom to parse some website. Is any way to extract the doctype?

Thanx, Granit

+2  A: 

You can use file_get_contents function to get all HTML data from website. For example

<?php
   $html = file_get_contents("http://google.com");
   $html = str_replace("\n","",$html);
   $get_doctype = preg_match_all("/(<!DOCTYPE.+\">)<html/i",$html,$matches);
   $doctype = $matches[1][0];
?>
antyrat
unfortunately I need to implement it using simple_html_dom.php
Granit
Have you tried to look for doctype by this code? $html->find('!DOCTYPE')
antyrat
yeah. I have tried $html->find('!DOCTYPE'), $html->find('DOCTYPE'), $html->find('doctype'), $html->find('!doctype').None of them worked for me.
Granit
It's strange because simple_html_dom.php has a pice of code below // doctype, cdata so this construction should work. Maybe you use old version of simple_html_dom.php file?
antyrat
Here is my code: $resultArray = $this->htmlDom->find('!DOCTYPE'); //var_dump($resultArray); if (is_array($resultArray) }
Granit
What this code prints? $resultArray = $this->htmlDom->find('!'); print_r($resultArray);
antyrat
It prints an empty array:Array()
Granit