views:

138

answers:

2

Hi.

I need help finding a viable solution to convert bbcode to html, this is where ive come so far, but fails when bbcodes get wrapped.

Src:

 [quote id="ohoh81"]asdasda
     [quote id="ohoh80"]adsad
         [quote id="ohoh79"]asdad[/quote]
     [/quote]
 [/quote]

Code:

<xsl:variable name="rules">
    <code check="&#xD;" >&lt;br/&gt;</code>
    <code check="\&#91;(quote)(.*)\&#93;" >&lt;span class=&#34;quote&#34;&gt;</code>
</xsl:variable>

<xsl:template match="text()" mode="BBCODE">
  <xsl:call-template name="REPLACE_EM_ALL">
    <xsl:with-param name="text" select="." />
    <xsl:with-param name="pos" select="number(1)" />
  </xsl:call-template>
</xsl:template>

<xsl:template name="REPLACE_EM_ALL">
  <xsl:param name="text" />
  <xsl:param name="pos" />
  <xsl:variable name="newText" select="replace($text, ($rules/code[$pos]/@check), ($rules/code[$pos]))" />
  <xsl:choose>
    <xsl:when test="$rules/code[$pos +1]">
      <xsl:call-template name="REPLACE_EM_ALL">
        <xsl:with-param name="text" select="$newText" />
        <xsl:with-param name="pos" select="$pos+1" />
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of disable-output-escaping="yes" select="$newText" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
+1  A: 

This is probably a bad idea because XSLT is designed to handle well-formed XML, not arbitrary text. I'd suggest you preprocess the BBCode first to replace the left and right brackets with < and >, do whatever else you need to to make it well-formed XML, and then process it with XSL.

Jim Garrison
XSLT 2.0 has `xsl:analyze-text` instruction, which is pretty awesome for processing of non-XML text.
Pavel Minaev
True, but it's still not intended to process a complete non-XML input file. I certainly wouldn't try to do this task purely in XSLT because a general parser for all of BBCode in XSL would be very complex and hard to maintain. BBCode is close enough to XML structure that it would be far easier to represent it as XML and then use the full power of XSLT to convert to XHTML.
Jim Garrison
@Jim: +1. XSLT is a wonderful tool for XML transformation, it's not an all-purpose language...
Erlock
@Jim: of course XSLT is a domain specific language, but it is meant for data transformation (originally only XML), but XSLT 2.0 left that path and expanded to allow input of multiple XML and (unicode) text files with `unparsed-text()` and `unparsed-text-available()`. So: it's still a data-transformation language, but definitely not meant to be bound to XML alone (luckily so, as many problem domains include XML and text input).
Abel
@Jim: PS: a variant of your suggestion, pre-processing it with XSLT and then post-processing it with XSLT in a micro-pipeline, is a common coding pattern in the XSLT world.
Abel
+2  A: 

I think a more viable approach would be to repeatedly match and replace (via regex) pairs of BBcode tags, until you get no matches. E.g. for [quote] and [url]:

<xsl:function name="my:bbcode-to-xhtml" as="node()*">
  <xsl:param name="bbcode" as="xs:string"/> 
  <xsl:analyze-string select="$bbcode" regex="(\[quote\](.*)\[/quote\])|(\[url=(.*?)\](.*)\[/url\])" flags="s">
    <xsl:matching-substring>
      <xsl:choose>
        <xsl:when test="regex-group(1)"> <!-- [quote] -->
          <span class="quote">
            <xsl:value-of select="my:bbcode-to-xhtml(regex-group(2))"/>
          </span>
        </xsl:when>
        <xsl:when test="regex-group(3)"> <!-- [url] -->
          <a href="regex-group(4)">
            <xsl:value-of select="my:bbcode-to-xhtml(regex-group(5))"/>
          </a>
        </xsl:when>
      </xsl:choose>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="."/>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:function>
Pavel Minaev
I thought it was commonly agreed here not to recommend regular expressions to parse structured languages. ;)
Tomalak
It will in practice only match a couple of times, so this will be fine. Thanx again Pavel
Sveisvei
The regex-group should be regex-group(1) with the above regex. Works in prod btw.
Sveisvei
@Tomalak: I generally recommend against parsing XML with regex simply because it's very hard to get it right for all corner cases (DOCTYPE, character entities, CDATA, correct invalid input handling, and so on). BBcode is much simpler - in fact, I strongly suspect it was invented by a lazy dev who didn't want to parse something XML-like, so came up with a scheme that's easier to deal with. Besides, `analyze-string` seems to be specifically geared at parsing text streams (that's why it repeatedly applies the regex, after all).
Pavel Minaev
Sure, but as soon as it gets to attributes (BBCode can have them, AFAIK) or other things that break nesting for a regex, it will fail just as badly as it will fail for XML/HTML/etc. Admittedly: As long as it does not get any more complex than the OP shows in his example (no attributes, no nested comments or HTML), and all BBcode tags are *guaranteed* to be nested correctly, a regex based approach can work. But it will still be the weak point of the whole construct.
Tomalak
Btw, Pavel - My example shows a recursive template to do many different regexes, do you have a recomended easy way of doing this with your function, or should i just make a better general regex, and give tags its value from its match?
Sveisvei
BBcode isn't guaranteed to nest correctly, but the traditional way of handling it is to process the pairs that match, and leave the unmatched ones be, which is what will happen here. I haven't seen attributes proper either, though you can get something like `[url=http://...]...[/url]`, or same thing for `[quote]` - but this is also trivial to deal with as there are no quotes, and no escape chars.
Pavel Minaev
@Sveisvei: I've edited the answer to demonstrate how mixed `[quote]` and `[url]` can be nested, and to show how to extract the parameter in URL and use it.
Pavel Minaev