ansaurus

Question

Answer 1

+2 A:

Responding to: "Any better idea?" (and I'm assuming that this was an invite not just for improvement over bbcode-specific suggestions)

We recently looked at going the bbcode route and decided on using htmlpurifier instead. This decision was based in part on the (admittedly biased probably) comparisons between various methods listed by the htmlpurifier group here and the discussion of bbcode (again, by the htmlpurifer group) here

And for the record I think your english was very good. I'm sure it's much better than I could do in your native language.

codemonkey 2009-01-28 19:32:48

Ah thank you, I'll probably include html purifier. But because i'm not really a fan of things like fck editor. I'd say that it will mostly be used to purify the html output. But it looks very nice.

Sybiam 2009-01-28 20:29:53

Answer 2

+7 A:

There are several existing libraries for parsing BBCode, it may be easier to look into those than trying to roll your own:

Here's a couple, I'm sure there are more if you look around:
PECL bbcode
PEAR HTML_BBCodeParser

Chad Birch 2009-01-28 19:36:02

Answer 3

A:

There's both a pecl and PEAR BBCode parsing library. Software's hard enough without reinventing years of work on your own.

If neither of those are an option, I'd concentrate on turning the BBCode into a valid XML string, and then using your favorite XML parsing routine on that. Very very rough idea here, but

Run the code through htmlspecialchars to escape any entities that need escaping
Transform all [ and ] characters into < and > respectively
Don't forget to account for the colon in cases like [tagname:

If the BBCode was nested properly, you should be all set to pass this string into an XML parsing object (SimpleXML, DOMDocument, etc.)

Alan Storm 2009-01-28 21:06:59

That's a horrible idea. What would [script] ... [/script] do?

Charlie Somerville 2009-12-22 06:47:16

Yeah, that's pretty awful if you're planning on outputting HTML back. What I wrote was assuming you're parsing the BBCode to pull out information. If you're using anything but official BBCode parsers (mentioned in the first paragraph) you're bound to leave yourself open to a XSS attack.

Alan Storm 2009-12-22 20:02:57

Answer 4

A:

Use preg_split() with PREG_DELIM_CAPTURE flag to split source code into tags and non-tags. Then iterate over tags keeping stack of open blocks (i.e. when you see opening tag, add it to an array. When you see closing tag, remove elements from end of the array until closing tag matches opening tag.)

porneL 2010-03-09 20:47:26

ansaurus

tags:

views:

answers:

Best way to parse bbcode

related questions