tags:

views:

268

answers:

3

Hi! I've written a regular expression to strip out BBCode tags - it just strips the allowed tags out (for later counting the string length without the tags).

I'm not an expert when it comes to regular expressions - so after an hour I found this pretty much working:

$pattern = "/\[\/?(i|b|u|url(.*?)|list|li)[\]\[]*\]/i";
$stripped = preg_replace($pattern, '', $text);

It only strips the allowed six tags (and no more - which it is supposed to) and the special tag 'url' which can be extended like 'url=http://someurl'.

I.e.

in:  [url=someurl]Lorem[/url] ipsum [test]dolor[/test] sit [b]amet[/b].
out: Lorem ipsum [test]dolor[/test] sit amet.

But the problem is, that it doesn't just strip out 'url=[sometext]' but also 'urlipsum'. I tried to add an '=' for parsing but couldn't get to the point.

Does anyone has a hint for me how to only strip out url when it comes with the =?

+1  A: 

Try:

$pattern = '/\[\/?(i|b|u|url(=[^\]]+)?|list|li)[\]\[]*\]/i';
cletus
This is great! Thank you very much!I added protocols for extra tests - since I don't want other than http(s), ftp and mailto: "/\[\/?(i|b|u|url(=(http|https|ftp|mailto)[^\]]+)?|list|li)[\]\[]*\]/i"
lorem monkey
A: 
  $pattern = "/\[\/?(i|b|u|url=(.*?)|url(?=\])|list|li)[\]\[]*\]/i";
hobodave
This also strips the url-tag when written without an equal sign - I'll keep it in mind.
lorem monkey
A: 

You may want to change the "greediness" of the quantifiers, try adding "U" pattern modifier or remove the question mark in ".*?", see PHP doc.

gb