ansaurus

Question

How do I pretty print an XSLT result document with removed source elements?

Answer 1

+1 A:

The white space you see is from the source document. XSLT default rules say that text nodes should be copied, it does not matter if they are empty or not. To override the default rule, include:

<xsl:template match="text()" />

Alternatively: Spot any <xsl:apply-templates /> (or <xsl:apply-templates select="node()" />) and explicitly specify which children you want to apply templates to. This method might be necessary if your transformation partly relies on the identity template (in which case the empty template for text nodes would be counter-productive).

I have marked up the "insignificant" white space in your snippet the way Word would do it:

<div id="container">¶
····<svg:svg>¶
········<svg:foreignObject>¶
············<img />¶
········</svg:foreignObject>¶
····</svg:svg>¶
</div>

EDIT: You can also modify your identity template like this:

<xsl:template match="node() | @*">
  <xsl:copy>
    <!-- select everything except blank text nodes -->
    <xsl:apply-templates select="
      node()[not(self::text())] | text()[normalize-space() != ''] | @*
    " />
  </xsl:copy>
</xsl:template>

This would remove any blank-only text node (attribute values remain untouched, they are not text nodes). Use <xsl:output indent="yes" /> to pretty-print the result.

Tomalak 2010-02-09 09:58:09

I am using the identity template—if I match `text()`, all of my content disappears. However, I'm not sure what you mean by the alternative; can you give me an example?

Hugh Guiney 2010-02-11 04:46:16

@Hugh: If your stylesheet heavily relies on the identity template, I recommend @Josh Davis' approach. I've created a shorter and more correct variant of it (he uses an unconditional `<xsl:copy-of select="@*" />`, which is not ideal).

Tomalak 2010-02-11 07:45:15

Thank you; I see what you mean. But unfortunately that did not work either. The result is all on one line, even with `<xsl:output indent="yes" />` set.

Hugh Guiney 2010-02-11 08:12:10

Tomalak 2010-02-11 09:07:54

You're right: looks like libxslt doesn't honor `indent="yes"` when `output="html"` is set. (PHP also has a bug where `$DOMDocument->formatOutput=true` doesn't have any effect on `$DOMDocument->saveHTML()`).I have tried to format via Tidy, but that isn't working either; it puts everything on a new line but doesn't indent them unless I set `indent=true`, which is not what I want since it also indents the contents of block-level elements. But I guess that's a separate issue.

Hugh Guiney 2010-02-11 10:56:24

@Hugh: The point is - whitespace in HTML is should be inherently insignificant. Presentation should not suffer from code layout (most of the time it does not, with the notable exception of adjacent inline elements). If you are in pursuit of nice source code format only, maybe you are taking it one step too far.

Tomalak 2010-02-11 11:52:24

Answer 2

+1 A:

You have two ways to achieve your desired result: either you fix your original transformation to handle whitespace differently, or you keep your transformation as-is and you add a second pass to prettify the output. If your original transformation is complicated then I'd recommend the 2-pass approach. You don't want to make your transformation even more complicated or you'll create some corner cases where you don't get the desired results and you'll have to add more special case handling and potentially add bugs to something that used to work, etc...

You should be able to ignore the whitespace nodes by testing them with normalize-text(). Here's how the second pass could look like. If you go with the 1-pass approach, the code will be roughly the same I guess.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

    <xsl:output method="xml" indent="yes" />

    <xsl:template match="text()">
        <xsl:if test="normalize-space(.) != ''">
            <xsl:value-of select="."/>
        </xsl:if>
    </xsl:template>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Josh Davis 2010-02-09 10:33:07

The first template, by itself, does ALMOST what I want, but it also removes all of the relevant indentation.But the second template puts the elements I've removed BACK into the result tree.

Hugh Guiney 2010-02-11 04:29:16

Also, the parser choked on the expression `normalize-space(.) != \'\'`. If I un-escaped the single-quotes, it worked.

Hugh Guiney 2010-02-11 04:57:38

@Hugh: That's because the code snippet shown here is part of a string definition in some programming language, not standalone XSLT (see, it even ends with `;`)

Tomalak 2010-02-11 07:23:01

Ah. Thought that was a typo. Thanks.

Hugh Guiney 2010-02-11 08:34:28

Sorry, you're right it was a typo: I forgot to clean the string after testing it in PHP. Originally I was about to post the short PHP snippet but I realized all you needed was the XSL, but then I forgot to remove the quotes. It's fixed now.

Josh Davis 2010-02-11 13:29:18

Oh, and you're supposed to run that on *the result* of your first transformation. Not the source document.

Josh Davis 2010-02-11 13:30:07

ansaurus

tags:

views:

answers:

How do I pretty print an XSLT result document with removed source elements?

related questions