views:

250

answers:

4

Using PHP, given a string such as: this is a <strong>string</strong>; I need a function to strip out ALL html tags so that the output is: this is a string. Any ideas? Thanks in advance.

+11  A: 

PHP has a built-in function that does exactly what you want: strip_tags

$text = '<b>Hello</b> World';
print strip_tags($text); // outputs Hello World

If you expect broken HTML, you are going to need to load it into a DOM parser and then extract the text.

Paolo Bergantino
+1 but be careful that strip_tags may not strip invalid HTML tags, so depending on the application you may need to do some extra processing afterwards..
Miky Dinescu
strip_tags() is very bad for xss protection as it only defends against a couple of xss attack vectors. Use htmlspecialchars($var,ENT_QUOTES)
Rook
+5  A: 

What about using strip_tags, which should do just the job ?

For instance (quoting the doc) :

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

will give you :

Test paragraph. Other text

Edit : but note that strip_tags doesn't validate what you give it. Which means that this code :

$text = "this is <10 a test";
var_dump(strip_tags($text));

Will get you :

string 'this is ' (length=8)

(Everything after the thing that looks like a starting tag gets removed).

Pascal MARTIN
+1  A: 

strip_tags is the function you're after. You'd use it something like this

$text = '<strong>Strong</strong>';
$text = strip_tags($text);
// Now $text = 'Strong'
Mez
A: 

I find this to be a little more effective than strip_tags() alone, since strip_tags() will not zap javascript or css:

$search = array(
    "'<head[^>]*?>.*?</head>'si",
    "'<script[^>]*?>.*?</script>'si",
    "'<style[^>]*?>.*?</style>'si",
);
$replace = array("","",""); 
$text = strip_tags(preg_replace($search, $replace, $html));
Stephen J. Fuhry