tags:

views:

51

answers:

3

I'm trying to wrap new lines in paragraphs without eliminating the HTML in the mixed node. I can get one or the other to work, but not both.

XML:

<root>
    <mixed html="true">
        line 1

        <a href="http://google.com"&gt;line 2</a>

        <em>line 3</em>
    </mixed>
</root>

desired output:

 <div>
     <p>line 1</p>
     <p><a href="http://google.com"&gt;line 2</a></p>
     <p><em>line 3</em></p>
 </div>

these templates match the HTML:

<xsl:template match="//*[@html]//*">
    <xsl:element name="{name()}">
    <xsl:apply-templates select="* | @* | text()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="//*[@html]//@*">
    <xsl:attribute name="{name(.)}">
        <xsl:copy-of select="."/>
    </xsl:attribute>
</xsl:template>

these templates convert new lines to paragraphs:

<xsl:template name="nl2p">

    <xsl:param name="input" />

    <xsl:variable name="output">
        <xsl:call-template name="newline-to-paragraph">
            <xsl:with-param name="input">
                <xsl:copy-of select="$input" />
            </xsl:with-param>
        </xsl:call-template>
    </xsl:variable>

    <xsl:copy-of select="$output" />

</xsl:template>

<!-- convert newline characters to <p></p> -->
<xsl:template name="newline-to-paragraph">

    <xsl:param name="input" />

    <xsl:variable name="output">

        <xsl:choose>
            <xsl:when test="contains($input, '&#10;')">
                <xsl:if test="substring-before($input, '&#10;') != ''">
                    <xsl:element name="p"><xsl:copy-of select="substring-before($input, '&#10;')" /></xsl:element>
                </xsl:if>
                <xsl:call-template name="newline-to-paragraph">
                    <xsl:with-param name="input">
                        <xsl:copy-of select="substring-after($input, '&#10;')" />
                    </xsl:with-param>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="$input != ''">
                    <xsl:element name="p"><xsl:copy-of select="$input" /></xsl:element>
                </xsl:if>
            </xsl:otherwise>
        </xsl:choose>

    </xsl:variable>

    <xsl:copy-of select="$output" />

</xsl:template>

Is this possible? I realize the nl2p template runs string functions on the nodeset -- does this destroy the HTML? Can I preserve it or use a specific order of operations to achieve this result?

Thanks in advance.

Edit: I'm using XSLT 1.0

+1  A: 

Well I developed this before I saw your comment that you're stuck with 1.0. But you said you're curious about 2.0, so here it is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
   <xsl:output method="html" indent="yes"/>

   <!-- Identity transform -->
   <xsl:template match="@* | node()">
      <xsl:copy>
         <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
   </xsl:template>

   <!-- surround all other elements with <p> -->
   <xsl:template match="*" priority="1">
      <p><xsl:copy><xsl:apply-templates select="@* | node()"/></xsl:copy></p>
   </xsl:template>

   <!-- recurse through root and mixed elements, but don't copy them. -->
   <xsl:template match="root | mixed" priority="2">
      <xsl:apply-templates select="node()"/>
   </xsl:template>

   <!-- Surround non-space text content with <p> if there are 
     newlines in the text, or element siblings. -->
   <xsl:template match="text()[contains(., '\n') or ../*]">
      <xsl:analyze-string select="." regex="\s*\n\s*">
         <xsl:non-matching-substring>
            <p><xsl:value-of select="."/></p>
         </xsl:non-matching-substring>
      </xsl:analyze-string>
   </xsl:template>

</xsl:stylesheet>

Given the input:

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <mixed html="true">
      line 1

      <a href="http://google.com"&gt;line 2</a>

      <em>line 3</em>
   </mixed>
</root>

it yields the desired output:

<p>line 1</p>
<p><a href="http://google.com"&gt;line 2</a></p>
<p><em>line 3</em></p>

The only thing that requires XSLT 2.0 about this is the <xsl:analyze-string>. You could do a similar thing by writing a template that recursively processes strings, looking for '\n' characters, using normalize-space, and surrounding the remaining pieces of text with <p>.

LarsH
Sorry about that -- thanks very much! Is there any chance that a 1.0 solution exists?
Casey
@Casey, see the last paragraph, which I just added. I gotto go to bed or I'd think about writing it. Hopefully Dimitre or Alejandro will be along in a few hours. :-)
LarsH
Ok, thanks for your help. I'll see if I can figure out that last paragraph. I'm still getting familiar with XSLT, so hopefully you come back when you wake up! :)
Casey
+2  A: 

EDIT: Sorry, I missed to split text nodes!

Most general problem: wraping non empty mixed content lines with p element

The problem here is how the input tree provider deals with white space only text nodes. Only Saxon seems to preserve white space only text nodes... Of course, adding xml:space="preserve" in the input, solves the problem for every other XSLT processor.

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes" />
    <xsl:preserve-space elements="*" />
    <xsl:template match="*[@html='true' or @nl2p='true']">
        <div>
            <xsl:apply-templates select="node()[1]"/>
        </div>
    </xsl:template>
    <xsl:template match="node()" mode="open" name="open">
        <xsl:copy-of select="." />
        <xsl:apply-templates select="following-sibling::node()[1]" 
                             mode="open" />
    </xsl:template>
    <xsl:template match="*[@html='true' or @nl2p='true']/node()">
        <xsl:param name="pTail" select="''" />
        <p>
            <xsl:value-of select="$pTail" />
            <xsl:call-template name="open" />
        </p>
        <xsl:variable name="vNext" 
        select="following-sibling::text()[contains(., '&#xA;')][1]" />
        <xsl:apply-templates select="$vNext">
            <xsl:with-param name="pString" 
            select="substring-after($vNext, '&#xA;')" />
        </xsl:apply-templates>
    </xsl:template>
    <xsl:template match="text()[contains(., '&#xA;')]" 
                  mode="open" priority="1">
        <xsl:value-of select="substring-before(., '&#xA;')" />
    </xsl:template>
    <xsl:template match="*[@html='true' or @nl2p='true']
                          /text()[contains(., '&#xA;')]"
                  priority="1" name="text">
        <xsl:param name="pString" select="."/>
        <xsl:choose>
            <xsl:when test="contains($pString, '&#xA;')">
                <xsl:variable name="vOutput" 
                select="normalize-space(substring-before($pString, '&#xA;'))" />
                <xsl:if test="$vOutput">
                    <p>
                        <xsl:value-of select="$vOutput"/>
                    </p>
                </xsl:if>
                <xsl:call-template name="text">
                    <xsl:with-param name="pString"
                    select="substring-after($pString, '&#xA;')" />
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates select="following-sibling::node()[1]">
                    <xsl:with-param name="pTail" select="$pString" />
                </xsl:apply-templates>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

With this input (more complex than question):

<root>
    <mixed html="true" xml:space="preserve">
        line 1
        line 2
        <a href="http://google.com"&gt;line 2</a> after

        before <em>line 3</em><img src="http://example.org"/&gt;
    </mixed>
</root>

Output:

<div>
<p>line 1</p>
<p>line 2</p>
<p>            <a href="http://google.com"&gt;line 2</a> after</p>
<p>            before <em>line 3</em><img src="http://example.org" /></p>
</div>

Reduce problem: wrapping non empty text nodes lines and every other node child with p element

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="node()|@*" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*[@html='true']/*">
        <p>
            <xsl:call-template name="identity"/>
        </p>
    </xsl:template>
    <xsl:template match="*[@html='true']/text()" name="text">
        <xsl:param name="pString" select="."/>
        <xsl:choose>
            <xsl:when test="contains($pString,'&#xA;')">
                <xsl:call-template name="text">
                    <xsl:with-param name="pString"
                            select="substring-before($pString,'&#xA;')"/>
                </xsl:call-template>
                <xsl:call-template name="text">
                    <xsl:with-param name="pString"
                            select="substring-after($pString,'&#xA;')"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:when test="normalize-space($pString)">
                <p>
                    <xsl:value-of select="normalize-space($pString)"/>
                </p>
            </xsl:when>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

With question input sample, output:

<root>
    <mixed html="true">
        <p>line 1</p>
        <p><a href="http://google.com"&gt;line 2</a></p>
        <p><em>line 3</em></p>
    </mixed>
</root>

With my own more complex input (without @xml:space):

<root>
    <mixed html="true">
        line 1
        line 2
        <a href="http://google.com"&gt;line 2</a> after

        before <em>line 3</em><img src="http://example.org"/&gt;
    </mixed>
</root>

Output:

<root>
    <mixed html="true">
        <p>line 1</p>
        <p>line 2</p>
        <p><a href="http://google.com"&gt;line 2</a></p>
        <p>after</p>
        <p>before</p>
        <p><em>line 3</em></p>
        <p><img src="http://example.org"&gt;&lt;/img&gt;&lt;/p&gt;
    </mixed>
</root>
Alejandro
Thanks Alejandro, I appreciate your help and I'm glad you considered a more generalized solution. I was worried I had oversimplified the test case after I read your initial reply. I'm going to compare my results with your method against Dimitre's and see which works for me. It's great to see different ways to tackle the problem.
Casey
@Casey: You are wellcome. Ask any doubts!
Alejandro
@Casey: Also, do note that for reduce problem I've post a compact stylesheet.
Alejandro
The first stylesheet you posted produces the result I'm looking for. The test case you created is way better than the one I presented, so thanks for that. Now I'm trying to get the result to render in FireFox 3.6 -- I need to remove the <mixed> node but I'm struggling since I don't fully understand the selectors. Any suggestions?
Casey
@Casey: I'm glad it helped you. When ussing an identity transformation pattern, in order to remove nodes you need to match the nodes an bypass, as example: `<xsl:template match="mixed"> <xsl:apply-templates/> </xsl:template>`
Alejandro
When I try this I see some bizarre results, as if there is a problem with the loop or recursion. It's very close but I don't fully understand your code so I'm struggling to debug it. You can see it here: http://caseyk.com/shared/templates/unused/html_nl2p.xml
Casey
@Casey: Sorry, here I'm ussing grained transversal identity (node by node). So the bypass rule is `<xsl:template match="*[@html='true']"> <xsl:apply-templates select="node()[1]|following-sibling::node()[1]"/> </xsl:template>` and instead of `<xsl:apply-templates/>` (like in root node rule) you should use `<xsl:apply-templates select="node()[1]"/>` (apply templates to first node child, because it will apply templates to its following itself)
Alejandro
Cool! I got it working. Thanks. Is there any reason why I would want the identity rule to start at the root instead of *[@html='true']
Casey
@Casey: You don't need the indentity rule at all. It was there just because we like it so much. Ja! See my edition.
Alejandro
@Casey: As you can see, you can start the node by node transversal just when you match the container.
Alejandro
Very nice, thanks for the explanation and for your time!
Casey
+1  A: 

A slight correction of your transformation:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes" />
    <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*" name="identity">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[@html='true']">
  <div>
    <xsl:apply-templates/>
  </div>
 </xsl:template>

 <xsl:template match="*[@html='true']/*">
  <p><xsl:call-template name="identity"/></p>
 </xsl:template>

 <xsl:template match="*[@html='true']/text()">
  <xsl:call-template name="nl2p"/>
 </xsl:template>

 <xsl:template name="nl2p">
    <xsl:param name="input" select="."/>

    <xsl:variable name="output">
        <xsl:call-template name="newline-to-paragraph">
            <xsl:with-param name="input">
                <xsl:copy-of select="$input" />
            </xsl:with-param>
        </xsl:call-template>
    </xsl:variable>

    <xsl:copy-of select="$output" />
 </xsl:template>

 <!-- convert newline characters to <p></p> -->
 <xsl:template name="newline-to-paragraph">
    <xsl:param name="input" />

    <xsl:variable name="output">
      <xsl:variable name="vlineText"
       select="normalize-space(substring-before($input, '&#10;'))"/>
      <xsl:variable name="vtextAfter"
       select="normalize-space(substring-after($input, '&#10;'))"/>
        <xsl:choose>
            <xsl:when test="contains($input, '&#10;')">
                <xsl:if test="$vlineText">
                  <p><xsl:copy-of select="$vlineText"/></p>
                </xsl:if>
                <xsl:call-template name="newline-to-paragraph">
                 <xsl:with-param name="input" select="$vtextAfter"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
              <xsl:if test="normalize-space($input)">
                <p><xsl:copy-of select="$input" /></p>
              </xsl:if>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:variable>

    <xsl:copy-of select="$output" />
 </xsl:template>

 <xsl:template match="/*">
  <xsl:apply-templates/>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <mixed html="true">
        line 1

        <a href="http://google.com"&gt;line 2</a>

        <em>line 3</em>
    </mixed>
</root>

produces the wanted result:

<div>
   <p>line 1</p>
   <p>
      <a href="http://google.com"&gt;line 2</a>
   </p>
   <p>
      <em>line 3</em>
   </p>
</div>
Dimitre Novatchev
Thanks for this -- much appreciated. I'm going to go over it this weekend and weigh it against Alejandro's solution. It's great to see the different angles of attack; I've learned a lot
Casey
Although your solution produces the result I initially requested, unfortunately my test case was not sufficient. With Alejandro's more complex example I've realized that his solution is better for my application. Thanks for your time though, seeing your solution helped greatly
Casey