views:

296

answers:

4

Question: How do I strip HTML tags but allow the greater and less-than sign using PHP?

If I used PHP's strip_tags() function, it doesn't quite work:

$string = '<p>if A > B</p>'
echo strip_tags($string);  // if A B
// but I want to output "if A > B"

UPDATE

Basically, I only want to allow/display plain text.

A: 

This will strip everything that looks like an HTML tag.

htmlentities(preg_replace('/<\\S.*?>/', '', $text));
amphetamachine
DrJokepu
And `<div title="a>b">`, and `<!-- > -->`, and, and, and...
bobince
+1  A: 

Unfortunately the simplest and most reliable way to get this working is to use an HTML parser. This one will do the trick. I don't know if it'll handle HTML fragments like the above. If not, then wrapping to make it acceptable HTML should be trivial.

As others are pointing out, parsing HTML with a regexp has numerous edge cases to cater for, and difficulty, since HTML is not regular.

Brian Agnew
+2  A: 

You can use HTML Purifier this will not only work with the <p>if A > B</p> example which you wrote, but also the example <p>1<2 && 6>4</p> written by DrJokepu.

Given the input <p>1<2 && 6>4</p> with the allowed elements set to none, HTML purifier gives the output: 1&lt;2 &amp;&amp; 6&gt;4.

TommyA
A: 

Try this regular expression that I wrote: <([^>]?="(\"|[^"])?")?([^>]?=''(\''|[^''])?'')?[^>]*?>