I'd like to work on a bbcode filter for a php website. (I'm using cakephp, it would be a bbcode helper) I have some requirement.
Bbcodes can be nested. So something like that is valid.
[block]
[block]
[/block]
[block]
[block]
[/block]
[/block]
[/block]
Bbcodes can have 0 or more parameters.
Exemple:
[video: url="url", width="500", height="500"]Title[/video]
Bbcodes might have mutliple behaviours.
Let say, [url]text[/url] would be transformed to [url:url="text"]text[/url] or the video bbcode would be able to choose between youtube, dailymotion....
I think it cover most of my needs. I alreay done something with regex. But my biggest problem was to match parameters. In fact, I got nested bbcode to work and bbcode with 0 parameters. But when I added a regex match for parameters it didn't match nested bbcode correctly.
"\[($tag)(=.*)\"\](.*)\[\/\1\]"
// It wasn't .* but the non-gready matcher
I don't have the complete regex with me right now, But I had something that looked like that(above).
So is there a way to match bbcode efficiently with regex or something else. The only thing I can think of is to use the visitor pattern and to split my text with each possible tags this way, I can have a bit more of control over my text parsing and I could probably validate my document so if the input text doesn't have valid bbcode. I could Notify the user with a error before saving anything.
I would use sablecc to create my text parser. http://sablecc.org/
Any better idea? or anything that could lead to a efficient flexible bbcode parser?
Thank you and sorry for my bad english...