Hello,
I've been racking my brain over this but can't seem to get it right, and I'm not hitting the correct keywords on Google..
I've recently started to play around with XSLT and XPath to create an XML description of natural language glossaries – for a project of mine.
The problem is that I have chosen to use 'mixed content' complex elements for some words and in some instances want to fetch just the text node.
Here's a portion of the XML document:
...
<entry category="substantiv">
<word lang="sv">semester</word>
<word lang="de">
<article>der</article>Urlaub
<plural>Urlaube</plural>
</word>
</entry>
...
There are many entry-elements in my document, and in this instance I want to fetch 'Urlaub' by using: /entry/word[@lang='de']/text()
, which because of my linebreaks wont work. I've discovered that there are actually three text nodes.. .../text()[2]
will work of course.. However, I don't know beforehand where there will be linebreaks, or how many. If the xml is formated like the following, my first version of the path will work but not the second:
...
<word lang="de"><article>der</article>Urlaub
<plural>Urlaube</plural>
</word>
...
What I think I want to do is select all the immediate text nodes of word[@lang='de'], and then remove unnecessary white space using normalize-space()
. However, how do I do this using XPath? Or is there a better way? It seems like it would be easy but I can't figure it out. I am by the way trying to do this within an XSLT document.
normalize-space(/entry/word[@lang='de']/text()[*])
is one of the things I have tried, but that seems to do something else.
/Grateful for any help.
Update:
Here is part of the XSLT, as requested:
...
<xsl:choose>
<xsl:when test="@category='substantiv'">
<em><xsl:value-of select="word[@lang='de']/article" /></em>
<xsl:value-of select="normalize-space(word[@lang='de']/text()[2])" />
<em>pl. <xsl:value-of select="word[@lang='de']/plural" /></em>
</xsl:when>
...
This code works just fine with the first version of formating. To clarify, what I want to do is to grap the value of the text node in the complex element <word lang="de">
, despite however it might be formated with line breaks and white spaces. What I will do with the value depends on context, but right now I will just put it in an xhtml doc.
Update2:
I am now using <xsl:strip-space elements="*"/>
which eliminates the problem of having empty text nodes. I am also using:
...
<xsl:choose>
<xsl:when test="@category='substantiv'">
<em><xsl:value-of select="word[@lang='de']/article" /></em>
<xsl:text> </xsl:text>
<xsl:value-of select="normalize-space(word[@lang='de']/text())" />
<xsl:text>, </xsl:text>
<em>pl. <xsl:value-of select="word[@lang='de']/plural" /></em>
</xsl:when>
...
Still have to normalize though since a space is still added after "Urlaub" in the XML.
When I need to reach the text node "Urlaub" outside of the XSLT document I use:
<xsl:value-of select="normalize-space(word[@lang='de']/text()[normalize-space() != ''])" />
Thanks for all the help folks!
Update 3: Tried to improve the title