tags:

views:

118

answers:

3

Hi , I have a form which is accept html data but we need only their respective text not anything else. is there any particular way to extract the text out of the html in php?

regards.

+4  A: 

Use strip_tags().

Thomas
it doesn't work perfectly for me here is an example :$htmlVar = "<div ">آمار و احتمالات کاربردی ">ن رشته‌‏هاى علوم ادارى، مديريت، بازرگانى، اقتصاد، حسابدارى و ساير رشته‌‏هاى وابسته تنظيم و تدوين شده است.</span></span></div><div style="text-align: justify; "> </div><div style="text-align: center; ">100921</div>";echo (strip_tags($htmlVar));Note: due to insufficient space I've deleted some of the html code.
austin powers
That is no HTML Code. Try echo (strip_tags(htmlspecialchars_decode($htmlVar)));
Thomas
+1  A: 

You can parse the HTML using DOMDocument::loadHTMLFile and extract what you need.

$doc = new DOMDocument();
$doc->loadHTMLFile("data.html");
$metaTags = $doc->getElementsByTagName('meta');
// Process $metaTags
Emil Ivanov
+2  A: 

surely it can be done :

just look at this function and use it as u like :

function html2txt ( $document )
{
        $search = array ("'<script[^>]*?>.*?</script>'si",  // strip out javascript
                "'<[\/\!]*?[^<>]*?>'si",        // strip out html tags
                "'([\r\n])[\s]+'",          // strip out white space
                "'@<![\s\S]*?�[ \t\n\r]*>@'",
                "'&(quot|#34|#034|#x22);'i",        // replace html entities
                "'&(amp|#38|#038|#x26);'i",     // added hexadecimal values
                "'&(lt|#60|#060|#x3c);'i",
                "'&(gt|#62|#062|#x3e);'i",
                "'&(nbsp|#160|#xa0);'i",
                "'&(iexcl|#161);'i",
                "'&(cent|#162);'i",
                "'&(pound|#163);'i",
                "'&(copy|#169);'i",
                "'&(reg|#174);'i",
                "'&(deg|#176);'i",
                "'&(#39|#039|#x27);'",
                "'&(euro|#8364);'i",            // europe
                "'&a(uml|UML);'",           // german
                "'&o(uml|UML);'",
                "'&u(uml|UML);'",
                "'&A(uml|UML);'",
                "'&O(uml|UML);'",
                "'&U(uml|UML);'",
                "'&szlig;'i",
                );
        $replace = array (  "",
                    "",
                    " ",
                    "\"",
                    "&",
                    "<",
                    ">",
                    " ",
                    chr(161),
                    chr(162),
                    chr(163),
                    chr(169),
                    chr(174),
                    chr(176),
                    chr(39),
                    chr(128),
                    "ä",
                    "ö",
                    "ü",
                    "�",
                    "�",
                    "�",
                    "�",
                );

        $text = preg_replace($search,$replace,$document);

        return trim ( $text );
}
farshad Ghazanfari