Hi, I recommend this:
http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page
It will remove all HTML tags. I also recommend checking the whole linked article, which explains how to combine this with other functions to end up with a clean, UTF8 text.
/**
* Remove HTML tags, including invisible text such as style and
* script code, and embedded objects. Adds line breaks around
* block-level tags to prevent word joining after tag removal.
*
* PHP's strip_tags( ) function will remove the tags, but it forgets to remove
* styles, scripts, and other unwanted text between the tags. When it removes
* the tags it also joins together the words before and after the tags.
* For block-level tags, like <p>, this is the wrong thing to do.
*
* From:
* http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page
*
* @param string $html
* @return string Clean of all kind of tags
*/
function strip_html_tags( $html )
{
$text = preg_replace(
array(
// Remove invisible content
'@<head[^>]*?>.*?</head>@siu',
'@<style[^>]*?>.*?</style>@siu',
'@<script[^>]*?.*?</script>@siu',
'@<object[^>]*?.*?</object>@siu',
'@<embed[^>]*?.*?</embed>@siu',
'@<applet[^>]*?.*?</applet>@siu',
'@<noframes[^>]*?.*?</noframes>@siu',
'@<noscript[^>]*?.*?</noscript>@siu',
'@<noembed[^>]*?.*?</noembed>@siu',
// Add line breaks before and after blocks
'@</?((address)|(blockquote)|(center)|(del))@iu',
'@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
'@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
'@</?((table)|(th)|(td)|(caption))@iu',
'@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
'@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
'@</?((frameset)|(frame)|(iframe))@iu',
),
array(
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
"\n\$0", "\n\$0",
),
$html );
$text = strip_tags( $text );
$text = ltrim(rtrim($text));
return $text;
}
This will convert something like:
<p><b>Welcome</b> to my <a href="example.com">homepage</a></p>
Into
Welcome to my homepage