Hi , I have a form which is accept html data but we need only their respective text not anything else. is there any particular way to extract the text out of the html in php?
regards.
Hi , I have a form which is accept html data but we need only their respective text not anything else. is there any particular way to extract the text out of the html in php?
regards.
You can parse the HTML using DOMDocument::loadHTMLFile
and extract what you need.
$doc = new DOMDocument();
$doc->loadHTMLFile("data.html");
$metaTags = $doc->getElementsByTagName('meta');
// Process $metaTags
surely it can be done :
just look at this function and use it as u like :
function html2txt ( $document )
{
$search = array ("'<script[^>]*?>.*?</script>'si", // strip out javascript
"'<[\/\!]*?[^<>]*?>'si", // strip out html tags
"'([\r\n])[\s]+'", // strip out white space
"'@<![\s\S]*?�[ \t\n\r]*>@'",
"'&(quot|#34|#034|#x22);'i", // replace html entities
"'&(amp|#38|#038|#x26);'i", // added hexadecimal values
"'&(lt|#60|#060|#x3c);'i",
"'&(gt|#62|#062|#x3e);'i",
"'&(nbsp|#160|#xa0);'i",
"'&(iexcl|#161);'i",
"'&(cent|#162);'i",
"'&(pound|#163);'i",
"'&(copy|#169);'i",
"'&(reg|#174);'i",
"'&(deg|#176);'i",
"'&(#39|#039|#x27);'",
"'&(euro|#8364);'i", // europe
"'&a(uml|UML);'", // german
"'&o(uml|UML);'",
"'&u(uml|UML);'",
"'&A(uml|UML);'",
"'&O(uml|UML);'",
"'&U(uml|UML);'",
"'ß'i",
);
$replace = array ( "",
"",
" ",
"\"",
"&",
"<",
">",
" ",
chr(161),
chr(162),
chr(163),
chr(169),
chr(174),
chr(176),
chr(39),
chr(128),
"ä",
"ö",
"ü",
"�",
"�",
"�",
"�",
);
$text = preg_replace($search,$replace,$document);
return trim ( $text );
}