views:

240

answers:

2

I have a pair of custom self closing tags s1 and s2 defined in namespace x in my xhtml. For each tag pair s1, s2 having the same id, I want to add span tags to all the text nodes between them. Each s1, s2 tag pair have a unique id. i am looking for a XSL based solution for the same. I am using Saxon java processor for XSL.

Sample input:

<html xmlns="http://www.w3.org/1999/xhtml"&gt; 
<head> 
<title>This is my title</title> 
</head> 
<body> 
<h1 align="center"> 
  This is my heading 
</h1> 
<p> 
  Sample content Some text here. Some content here. 
</p> 
<p> 
   Here you go. 
</p> 
</body> 
</html> 

Sample Output:

<html xmlns="http://www.w3.org/1999/xhtml"&gt; 
<head> 
<title>This is my title</title> 
</head> 
<body> 
<h1 align="center"> 
  This <span class="spanClass" id="1">is my</span>heading 
</h1> 
<p> 
  Sample content <span class="spanClass" id="2">Some text here. Some content here.</span> 
</p> 
<p> 
   <span class="spanClass" id="3">Here you</span>go. 
</p> 
</body> 
</html> 
+2  A: 

EDIT: Modified answer to work with the XHTML sample you've added.

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:x="http://www.w3.org/1999/xhtml" 
  xmlns="http://www.w3.org/1999/xhtml" 
  exclude-result-prefixes="x"
>
  <xsl:output method="xml" omit-xml-declaration="yes" encoding="utf-8" />

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()[normalize-space()]">
    <xsl:choose>
      <xsl:when test="preceding::x:s1[1][
        not(following::x:s2[1][
          following::text()[generate-id() = generate-id(current())]
        ])
      ]">
        <span class="spanClass" id="{generate-id()}">
          <xsl:copy-of select="." />
        </span>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy-of select="." />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="x:s1|x:s2" />

</xsl:stylesheet>

Result (line breaks/indentation for readability):

<html xmlns="http://www.w3.org/1999/xhtml"&gt;
  <head>
    <title>This is my title</title>
  </head>
  <body>
    <h1 align="center">
      This <span class="spanClass" id="IDATA2Q">is my</span>heading
    </h1>
    <p>
      Sample content <span class="spanClass" id="IDA2A2Q">Some text here. Some content here.</span>
    </p>
    <p>
       <span class="spanClass" id="IDA5A2Q">Here you</span>go.
    </p>
  </body>
</html>
Tomalak
@Rachel: And this is exactly the reason why you should *not* "make up" XML, but use a real-world example. It is easy to miss essential details when making up stuff to "simplify" the question, resulting in double work for everybody. What a waste, don't you think?
Tomalak
Your solution isn't bad, too. +1 from me :)
Dimitre Novatchev
@Rachel: Simply modify your example XML to reflect reality.
Tomalak
@Dimitre: Thanks. I was quite proud of the XPath expression until you had to come around and shred my approach. ;-)
Tomalak
@Rachel: See my modified answer. I'm not really convinced that your sample actually reflects reality, though. There are no `<s1>` or `<s2>` elements in XHTML, so your input document is not valid. Can it be that these two are in a different namespace?
Tomalak
Yes you are right.
Rachel
@Tomalak: The change that I could see in the XSL is addition of namespace compared to the previous one. Does this make major difference?
Rachel
@Rachel: Seriously, this makes baby Jesus cry. I've asked you to include a complete, well-formed and real-world sample to your question instead of something you just made up. You changed your question, only to tell me now that the new XML in your question might *not yet* be the real thing and that you *might* have forgotten some namespaces. I'm not going to modify my answer for a third time just because you keep changing the question in mid-air. Next time, please make up your mind before you ask a question here, because this "iterative" approach is an incredible waste of time. Sorry.
Tomalak
@Rachel: I read "I'm trying to test the feasibility" as "I'm not even sure if we're going to do it with XSLT, or at all even, but let's ask some guys on the Internet and see what they come up with". Sorry, wrong attitude. I've now made that final edit to my answer that I actually wanted to avoid. It works for the sample you have supplied in that it produces the output you want. Please don't bother posting more details or clarifications, if this answer does not do it for you, I can't help you.
Tomalak
Thanks for you time and sorry for the inconvenience caused.
Rachel
+4  A: 

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()[(preceding::s1 | preceding::s2)[last()][self::s1]]">
  <span class="spanClass" id="{generate-id()}">
   <xsl:copy-of select="."/>
  </span>
 </xsl:template>

 <xsl:template match="s1|s2"/>
</xsl:stylesheet>

when applied on the original XML document (corrected to well-formed):

<a>
  <b>Some <s1 id="1" />text here</b>
  <c>Some <s2 id="1"/>more text <s1 id="2"/> here</c>
  <d>More data</d>
  <e>Some <s2 id="2" />more data</e>
</a>

produces the wanted output:

<a>
  <b>Some <span class="spanClass" id="IDANI2QB">text here</span></b><span class="spanClass" id="IDAOI2QB">
  </span><c><span class="spanClass" id="IDAQI2QB">Some </span>more text <span class="spanClass" id="IDAWI2QB"> here</span></c><span class="spanClass" id="IDAXI2QB">
  </span><d><span class="spanClass" id="IDAYI2QBIDAYI2QB">More data</span></d><span class="spanClass" id="IDAZI2QB">
  </span><e><span class="spanClass" id="IDA1I2QB">Some </span>more data</e>
</a>
Dimitre Novatchev
+1 Well played, sir. :)
Tomalak
BTW, can you explain why the same node has a longer-than-usual id in our results?
Tomalak
@Rachel: Please, update your question with a more refined definition of the problem, or ask another question. The lesson in this is to think before asking a question, how best to define the problem.
Dimitre Novatchev
@Tomalak: The value of generate-id() is implementation dependent. I used MSXML4 here. Using Saxon probably will generate completely different ids.
Dimitre Novatchev
@Dimitre: I was expecting you'd use MSXML because of the simiLarity (I'm using msxsl.exe most of the time), but I've never seen a generated ID with double length, and I suspected you knew the condition that triggers this.
Tomalak