ansaurus

Question

Mechanism to strip specific tags from an XHTML document (but keep their contents)?

Answer 1

+3 A:

I need to maintain the structure of the original document minus the stripped tags

Have you thought of XSLT? This is the language specifically designed for transforming XML and generally tree structures.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="b">
  <xsl:apply-templates/>
 </xsl:template>
</xsl:stylesheet>

when applied on any XHTML document, as the one below:

<html>
 <head/>
 <body>
  <p> Hello, <b>World</b>!</p>
 </body>
</html>

produces the wanted, correct result, in this case:

<html>
   <head/>
   <body>
      <p> Hello, World!</p>
   </body>
</html>

Dimitre Novatchev 2010-09-05 01:57:37

I had thought of XSLT, in fact just updated the question to relect that because I mistakenly called it XPath. However I couldn't think of a good XSLT for the problem. Apparently you have the solution. I will try it ....

John K 2010-09-05 02:16:45

@John-K: You are welcome. Please, don't hesitate to ask if there is something that needs to be explained. :)

Dimitre Novatchev 2010-09-05 03:47:09

... works like a charm. Thanks. I'm using it via the XslCompiledTransform Class http://msdn.microsoft.com/en-us/library/system.xml.xsl.xslcompiledtransform(v=VS.90).aspx

John K 2010-09-05 20:42:46

@Dimitre: `<xsl:strip-space elements="*"/>` is a bit too much for HTML/XHTML: `<pre>` elements will be cleaned. So I suggest to remove that line.

dolmen 2010-09-09 15:21:14

ansaurus

tags:

views:

answers:

Mechanism to strip specific tags from an XHTML document (but keep their contents)?

related questions