ansaurus

Question

Trim whitespace from parent element only

Answer 1

+2 A:

You want:

 <xsl:template match="text()">
  <xsl:value-of select=
   "substring(
       substring(normalize-space(concat('[',.,']')),2),
       1,
       string-length(.)
              )"/>
 </xsl:template>

This wraps the string in "[]", then performs normalize-string(), then finally removes the wrapping characters.

Dimitre Novatchev 2010-10-22 04:40:36

@Dimitre Novatchev - I believe the square brackets were used to demonstrate what it is currently doing(stripping out leading and trailing whitespace from each text node). This doesn't achieve the desired output (which hasn't been clearly stated).

Mads Hansen 2010-10-22 11:56:39

@Mads Hansen: If the wrapping characters are just for illustrative purposes, which seems likely, then they can be removed after applying `normalize-space()`. I updated my answer to do exactly this and I think this is what the OP wants. This is the only answer so far that normalizes the internal whitespaces in a text node.

Dimitre Novatchev 2010-10-22 13:14:27

Interesting idea, but I'm afraid it doesn't actually work -- I get "Hey, ]italics and italics!" when I try? But +1 for the helpful comments to the other answers.

jpatokal 2010-10-24 21:11:15

Answer 2

+2 A:

I would do something like this:

<xsl:template match="p">
    <xsl:apply-templates/>
</xsl:template>

<!-- strip leading whitespace -->
<xsl:template match="p/node()[1][self::text()]">
  <xsl:call-template name="left-trim">
     <xsl:with-param name="s" value="."/>
  </xsl:call-template>
</xsl:template>

This will strip left space from the initial node child of a  element, if it is a text node. It will not strip space from the first text node child, if it is not the first node child. E.g. in

<p><em>Hey</em> there</p>

I intentionally avoid stripping the space from the front of 'there', because that would make the words run together when rendered in a browser. If you did want to strip that space, change the match pattern to

match="p/text()[1]"

If you also want to strip trailing whitespace, as your title possibly implies, add these two templates:

<!-- strip trailing whitespace -->
<xsl:template match="p/node()[last()][self::text()]">
  <xsl:call-template name="right-trim">
     <xsl:with-param name="s" value="."/>
  </xsl:call-template>
</xsl:template>

<!-- strip leading/trailing whitespace on sole text node -->
<xsl:template match="p/node()[position() = 1 and
                              position() = last()][self::text()]"
              priority="2">
   <xsl:value-of select="normalize-space(.)"/>
</xsl:template>

The definitions of the left-trim and right-trim templates are at Trim Template for XSLT (untested). They might be slow for documents with lots of s. If you can use XSLT 2.0, you can replace the call-templates with

  <xsl:value-of select="replace(.,'^\s+','')" />

and

  <xsl:value-of select="replace(.,'\s+$','')" />

(Thanks to Priscilla Walmsley.)

LarsH 2010-10-22 11:26:22

+1 I don't think it achieves exactly what @jpatokal wants, but it hasn't been stated very clearly. This provides all the information needed to trim the leading space from `p/text()[1]`, which is what I think is wanted.

Mads Hansen 2010-10-22 11:53:50

@LarsH: Good answer. I think you want not `p/node()[1][self::text()]` but `p/node()[self::text()][1]` instead. The same for the last text node.

Dimitre Novatchev 2010-10-22 13:05:51

@Dimitre: wouldn't that either (a) yield the first/last text node, regardless of whether they were "outside" any non-text children; or (b) do the same as what I had? Please explain further, as I would like to understand this better.

LarsH 2010-10-22 14:15:31

@Mads, I don't think he wants to trim the leading space from `p/text()[1]` if p/text()[1] is preceded by an element such as `a`, do you? But I agree, let @jpatokal clarify.

LarsH 2010-10-22 14:19:54

@LarsH: `p/node()[1][self::text()]` means: the first node child of `p` but only if it is a text node. While what you want is: The first of all the text node children of `p`

Dimitre Novatchev 2010-10-22 14:27:25

@Dimitre: in other words, the expression you suggest would do (a). However I believe the OP wants to strip "the first node child of p but only if it is a text node". E.g. in `Hey there` we should not strip the ' ' before 'there', because then it would be rendered with no space between 'Hey' and 'there'. But maybe @jpatokal will clarify.

LarsH 2010-10-22 14:34:33

@LarsH: I believe your solution leaves the spaces in the first text node of: ` Hey there`, while the OP wants them stripped-off. This is due to the fact that you are processing the first child node only if it is also a text node and in this case `em` is not a text node but is the first child node.

Dimitre Novatchev 2010-10-22 16:32:05

@Dimitre: I agree with you on what my code does, which is what I understand "from parent element only" to mean. (Note that your suggested expression, `p/node()[self::text()][1]`, does not strip the first space from your example ` Hey there` either.) I guess we disagree on what the OP wants. Given that you and @Alej have taken good and somewhat different stabs at what you believe @jptokal wants, I won't spend more time on speculative solutions until/unless the OP clarifies.

LarsH 2010-10-22 21:30:09

The basic requirement boils down to "never *start* with whitespace", even if it's wrapped in a few containing tags. Space between tags should not be stripped.

jpatokal 2010-10-24 21:21:39

However, your solution (with the XSLT 2 replace) is not working for me, eg. plain Foo turns into null? Is $arg magic or is the definition for it missing?

jpatokal 2010-10-24 21:22:57

@jpatokal: Sorry, `$arg` should be `.` in the `replace()` call. I'll edit this to fix.

LarsH 2010-10-25 11:28:57

@jpatokal: 'Space between tags should not be stripped.' At face value that seems to contradict '"never start with whitespace", even if it's wrapped in a few containing tags.' Maybe you mean 'space between non-space text should not be stripped'?

LarsH 2010-10-25 11:31:20

"Space between tags" = between the closing of one tag and the opening of another, eg. `[ here ]`. Not the same as `[ this ]`.

jpatokal 2010-10-26 02:54:50

@jpatokal: ok. It would be helpful next time if you would define your requirements more accurately from the beginning, so we avoid wasting time implementing the wrong ones. However, I understand that sometimes, defining the requirements correctly is the biggest part of the problem.

LarsH 2010-10-26 10:55:10

@jpatokal: Am I right in thinking you *do* want to strip space between the closing of one tag and the opening of another if there has not been any text yet? E.g. in ` Hi`. In which case the defining requirement is as you said, the text should never start with whitespace, regardless of the level of embedding; space *after text* should not be stripped.

LarsH 2010-10-26 11:25:06

No. "Never start with whitespace" -- if there's a complete tag before the whitespace, then it's not starting with whitespace.

jpatokal 2010-10-27 03:27:46

@jpatokal - So "never start with whitespace" ignores multiple opening tags (despite the question title), but does not ignore complete or close tags. Why? I inferred that the goal was HTML that would not render an initial space; but apparently that's not it.

LarsH 2010-10-27 14:13:51

A complete or close tag will (presumably) render something, so the HTML does not render an initial space. In other news, the horse is dead. Stop beating it and move on.

jpatokal 2010-10-29 04:00:33

@jpatkal, ``, as in my example, does not render anything. You're free to stop responding whenever you like.

LarsH 2010-10-29 14:48:19

@jpatokal: "Stop beating it and move on." I and others spent our time trying to help you at your request. Some of that time was in vain because your question was underspecified. We took the risk of trying to infer the specs and in some cases were wrong. Having spent that time, I'm still interested in nailing down a consistent definition of what the task actually was. Haven't seen it yet. If you're not interested in that, you move on, but rudeness on your part is unjustified.

LarsH 2010-10-29 19:41:18

Answer 3

+4 A:

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="p//text()[1][generate-id()=
                                      generate-id(ancestor::p[1]
                                                  /descendant::text()[1])]">
        <xsl:variable name="vFirstNotSpace"
                      select="substring(normalize-space(),1,1)"/>
        <xsl:value-of select="concat($vFirstNotSpace,
                                     substring-after(.,$vFirstNotSpace))"/>
    </xsl:template>
</xsl:stylesheet>

Output:

<p>Hey, <em>italics</em> and <em>italics</em>!</p>

Edit 2: Better expression (now only three function calls).

Edit 3: Matching the first descendant text node (not just the first node if it's a text node). Thanks to @Dimitre's comment.

Now, with this input:

<p><b>  Hey, </b><em>italics</em> and <em>italics</em>!</p>

Output:

<p><b>Hey, </b><em>italics</em> and <em>italics</em>!</p>

Alejandro 2010-10-22 13:11:34

Wow. :-) I think I see what the nested substring() calls are doing, and it's much better than a recursive template. +1

LarsH 2010-10-22 14:18:12

@Alejandro: I think you have the smae issue as Lars: I think you want not `p/node()[1][self::text()]` but `p/node()[self::text()][1]` instead.

Dimitre Novatchev 2010-10-22 17:25:27

@Dimitre: That would be the same as `p/text()[1]`, but I know what you mean.

Alejandro 2010-10-22 17:55:07

+1 I think we have a winner! That is what I understand the desired output to be. Very nice solution.

Mads Hansen 2010-10-22 17:55:10

@Alejandro: Not exactly, consider: ` Hello `. This would be: `(p//text())[1]`

Dimitre Novatchev 2010-10-22 19:06:56

@Alejandro: Oh, I see that you have fixed this. Good, +1

Dimitre Novatchev 2010-10-22 19:08:59

Don't know about performance, but it could be just `text()[generate-id()=generate-id(ancestor::p[1]/descendant::text()[1])]`. Or with keys: `<xsl:key name="kIsPFirstDescendant" match="text()" use="generate-id(ancestor::p[1]/descendant::text()[1])"/><xsl:template match="text()[key('kIsPFirstDescendant',generate-id())]">...`

Alejandro 2010-10-22 19:23:02

A bit more complicated than I was hoping, but seems to work like a charm. Thanks!

jpatokal 2010-10-24 21:13:26

@jpatokal: You are wellcome. As a side: complicated? An identity rule and only other one rule? Pattern is a bit complex because of pattern axis restrictions: you can't say in pattern `p/descendat::text()[1]`, so I've reversed this.

Alejandro 2010-10-24 21:19:36

I suppose it's simple by the insane standards of XSLT, but in any sensible programming language this would be `trim()` or `strip()`...

jpatokal 2010-10-24 21:29:33

@jpatokal: Mixing things, I think... If you are refering to ltrim or rtrim kind of functions, I'll give you that. It looks like we can manage with just `fn:normalize-space()`. I think that this XPath 1.0 it's no so wrong: `substring-after(.,substring-before(.,substring(normalize-space(),1,1)))` meaning *the string after the string is before first not white space character*. But the whole process (*copy everything as is, but for every text node being the first descendat of a `p` do left trim white spaces*) I really don't think you could express this more compact.

Alejandro 2010-10-24 21:55:39

ansaurus

tags:

views:

answers:

Trim whitespace from parent element only

related questions