You would probably be better off doing this programmatically, rather than with pure XSLT, but if you have to use XSLT, here is one way to do it. It does involve multiple stylesheets, although if you had were able to use extension functions, you can make use of node-sets, and combine them into one big (and nasty) style sheet.
The first stylesheet would copy the intial XML, but 'tokenise' any text it finds, so that each word in the text becomes a separate 'WORD' element.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!-- Copy existing nodes and attributes -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Match text nodes -->
<xsl:template match="text()">
<xsl:call-template name="tokenize">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
</xsl:template>
<!-- Splits a string into separate elements for each word -->
<xsl:template name="tokenize">
<xsl:param name="string"/>
<xsl:param name="delimiter" select="' '"/>
<xsl:choose>
<xsl:when test="$delimiter and contains($string, $delimiter)">
<xsl:variable name="word" select="normalize-space(substring-before($string, $delimiter))"/>
<xsl:if test="string-length($word) > 0">
<WORD>
<xsl:value-of select="$word"/>
</WORD>
</xsl:if>
<xsl:call-template name="tokenize">
<xsl:with-param name="string" select="substring-after($string, $delimiter)"/>
<xsl:with-param name="delimiter" select="$delimiter"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="word" select="normalize-space($string)"/>
<xsl:if test="string-length($word) > 0">
<WORD>
<xsl:value-of select="$word"/>
</WORD>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
The XSLT template used to 'tokenize' a string of text, I took from this question here:
tokenizing-and-sorting-with-xslt-1-0
(Note that in XSLT2.0, I believe there is a tokenize function, which would simplify the above)
This would give you XML like so...
<PUBLDES>
<WORD>The</WORD>
<IT>
<WORD>European</WORD>
<WORD>Journal</WORD>
<WORD>of</WORD>
....
And so on...
Next, it is a case of traversing this XML document, using another XSLT document, outputting only upto the first 45 word elements. To do this, I repeatedly apply a template, keeping a running total of the number of WORDS currently found. When matching a node, there are three possibilities
- Match a WORD element: Output it. Carry on processing from next sibling if total is not reached.
- Match a element where the number of words below it is less than the total: Copy the whole element, and then carry on processing from next sibling if total is not reached
- Match elements where number of words below would exceed total: Copy the current node (but not its children) and continue processing at first child.
Here is the style sheet, in all its hideousness
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:variable name="WORDCOUNT">6</xsl:variable>
<!-- Match root element -->
<xsl:template match="/">
<xsl:apply-templates select="descendant::*[1]" mode="word">
<xsl:with-param name="previousWords">0</xsl:with-param>
</xsl:apply-templates>
</xsl:template>
<!-- Match any node -->
<xsl:template match="node()" mode="word">
<xsl:param name="previousWords"/>
<!-- Number of words below the element (at any depth) -->
<xsl:variable name="childWords" select="count(descendant::WORD)"/>
<xsl:choose>
<!-- Matching a WORD element -->
<xsl:when test="local-name(.) = 'WORD'">
<!-- Copy the word -->
<WORD>
<xsl:value-of select="."/>
</WORD>
<!-- If there are still words to output, continue processing at next sibling -->
<xsl:if test="$previousWords + 1 < $WORDCOUNT">
<xsl:apply-templates select="following-sibling::*[1]" mode="word">
<xsl:with-param name="previousWords">
<xsl:value-of select="$previousWords + 1"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:if>
</xsl:when>
<!-- Match a node where the number of words below it is within allowed limit -->
<xsl:when test="$childWords <= $WORDCOUNT - $previousWords">
<!-- Copy the element -->
<xsl:copy>
<!-- Copy all its desecendants -->
<xsl:copy-of select="*|@*"/>
</xsl:copy>
<!-- If there are still words to output, continue processing at next sibling -->
<xsl:if test="$previousWords + $childWords < $WORDCOUNT">
<xsl:apply-templates select="following-sibling::*[1]" mode="word">
<xsl:with-param name="previousWords">
<xsl:value-of select="$previousWords + $childWords"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:if>
</xsl:when>
<!-- Match nodes where the number of words below it would exceed current limit -->
<xsl:otherwise>
<!-- Copy the node -->
<xsl:copy>
<!-- Continue processing at very first child node -->
<xsl:apply-templates select="descendant::*[1]" mode="word">
<xsl:with-param name="previousWords">
<xsl:value-of select="$previousWords"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
If you were outputting just the first 4 words, say, this would give you the following output
<PUBLDES>
<WORD>The</WORD>
<IT>
<WORD>European</WORD>
<WORD>Journal</WORD>
<WORD>of</WORD>
</IT>
</PUBLDES>
Of course, you would then need yet another transformation to remove the WORD elements, and just leave the text. This should be fairly straight-forward....
This is all very nasty though, but it is the best I could come up with for now!