ansaurus

Question

Selecting a child text node amongst white space text nodes, in a complex XML element using XPath

Answer 1

A:

Now that I see your code I recommend this:

<xsl:choose>
  <xsl:when test="@category='substantiv'">
    <em><xsl:value-of select="word[@lang='de']/article" /></em>^
    <!-- select the first non-empty text node and normalize it -->
    <xsl:value-of select="normalize-space(word[@lang='de']/text()[normalize-space() != ''][1])" />
    <em>pl. <xsl:value-of select="word[@lang='de']/plural" /></em>
  </xsl:when>

Original Version of the answer

To get you started:

<entry category="substantiv">
  <word lang="sv">semester</word>
  <word lang="de">
    <article>der</article>Urlaub
    <plural>Urlaube</plural>
  </word>
</entry>

When passed through this XSLT 1.0:

<!-- identity template copies everything 1:1, unless other templates apply -->
<xsl:template match="*|@*">
  <xsl:copy>
    <xsl:apply-templates select="*|@*" />
  </xsl:copy>
</xsl:template>

<!-- empty template: ignore every white-space-only text-node child of <word> -->
<xsl:template match="word/text()[normalize-space() = '']" />

Would produce this:

<entry category="substantiv">
  <word lang="sv">semester</word>
  <word lang="de"><article>der</article>Urlaub<plural>Urlaube</plural></word>
</entry>

This answer is a guess and may not be exactly what you are after. Your question needs clarification in any case. Not always is what you think you want the same as what you actually want.

Tomalak 2010-08-09 18:45:36

Ah, yes I was not clear at all. I didn't want to change the formating, only handle different scenarios of formating. But you helped me with something else so your answer was still useful. Thanks! :)

nimbus77 2010-08-09 21:26:48

@nimbus: Did you notice that the top section of my answer changed?

Tomalak 2010-08-09 21:37:26

Yes I did, that change does the trick. Thanks for helping out. I'm a bit confused now though as to how exactly text() is supposed to work, but I'll start a new question tomorrow for that if I can't figure it out.

nimbus77 2010-08-09 21:58:17

@nimbus: `text()` is, despite the parentheses, not a function. At least not the way you probably think it would be. It selects text nodes, the same way as `foo` would select `<foo>` elements. The parentheses are a way to separate it from `text`, which would select `<text>` elements.

Tomalak 2010-08-10 12:06:19

@Tomalak: Yea I was fooled by that. I also found out today that it is called a node test. I also thought it would automatically concatenate the text nodes into one string like if I had ended the XPath with: `word[@lang="de"]`. But, now I know better. :)

nimbus77 2010-08-10 14:22:39

Answer 2

A:

Try:

/entry/word[@lang='de']/child::text()[normalize-space(.) != '']

Meaning, grab all child text nodes but not those that normalize to an empty string.

-Oisin

x0n 2010-08-09 18:50:54

Mentioning the `child::` axis is superfluous. Also, `normalize-space()` operates on the current node by default, so mentioning it though `.` is not necessary.

Tomalak 2010-08-09 18:53:13

@x0n, typing word[@lang='de']/text()[normalize-space() != ''] does the trick. Thanks!

nimbus77 2010-08-09 21:20:26

Answer 3

A:

I think this is the skeleton of what you want, minus any normalize-space() to get things to look exactly the way you want.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/">
    <xsl:apply-templates select=".//word"/>
  </xsl:template>
  <xsl:template match="word">
    <xsl:apply-templates select=".//text()"/>
  </xsl:template>
  <xsl:template match="text()"><xsl:value-of select="."/><xsl:text> </xsl:text></xsl:template>  
</xsl:stylesheet>

The key is the .//text() which returns the concatenation of ALL child text nodes at any nesting level below the context node().

Jim Garrison 2010-08-09 20:07:31

That's what I thought `.//text()` would do to.. Maybe I'm doing it wrong? If I use `<xsl:value-of select="normalize-space(word[@lang='de']//text())" />` (haven't started using templates yet, going to though) I get nothing. But if I test it in my XPath evaluator it finds 5 possible text nodes, since 'der' and 'Urlaube' are also added.

nimbus77 2010-08-09 21:41:45

@Jim: *"The key is `the .//text()` which returns the concatenation of ALL child text nodes"* - Actually, that's wrong. `//text()` *selects* all the text nodes, it returns a node-set of separate nodes, not a concatenated string.

Tomalak 2010-08-10 12:02:58

Answer 4

+2 A:

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:value-of select="/*/entry/word[@lang='de']/text()[1]"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document (wrapped in a dict top element):

<dict>
    <entry category="substantiv">
        <word lang="sv">semester</word>
        <word lang="de">
            <article>der</article>Urlaub
            <plural>Urlaube</plural>
        </word>
    </entry>
</dict>

produces exactly the wanted result:

Urlaub

Do note: the use of the <xsl:strip-space> instruction to eliminate all white-space-only text nodes from the source XML document.

Therefore, no additional processing (normalize-space(), etc) is necessary.

Dimitre Novatchev 2010-08-09 20:52:19

That was a really nice solution. Vielen Dank! :)

nimbus77 2010-08-09 21:53:17

Turns out there is still white space after "Urlaub" but that is not a problem.

nimbus77 2010-08-09 22:45:04

ansaurus

tags:

views:

answers:

Selecting a child text node amongst white space text nodes, in a complex XML element using XPath

related questions