views:

449

answers:

3

I am attempting to scan a string of words and look for the presence of a particular word(case insensitive) in an XSLT 2.0 stylesheet using REGEX.

I have a list of words that I wish to iterate over and determine whether or not they exist within a given string.

I want to match on a word anywhere within the given text, but I do not want to match within a word (i.e. A search for foo should not match on "food" and a search for bar should not match on "rebar").

XSLT 2.0 REGEX does not have a word boundary(\b), so I need to replicate it as best I can.

A: 

This is what I have come up with so far:

    <!-- 
         [^a-zA-Z] attempts to prevent matching within words when used as a prefix or suffix to a regex expression
     -->
 <xsl:variable name="b" select="'[^a-zA-Z]'" />

<!-- 
        ^ ensures that the leading word can be matched 
        $ ensures that trailing word will be matched
 -->
<xsl:if test="matches($prose, concat('^',$word, $b),'i') or matches($prose, concat($b, $word, '$'),'i') or matches($prose, concat($b, $word, $b),'i')">

        <!--We found the word -->

</xsl:if>

This seems to be working reasonably well, but I am repeating virtually the same expression 3 times, which is not very efficient.

Mads Hansen
+1  A: 

You can use alternation to avoid repetition:

<xsl:if test="matches($prose, concat('(^|\W)', $word, '($|\W)'),'i')">
Tim Pietzcker
That expression would not compile because XSLT doesn't have non-capturing groups. However, \W is supported and alternate patterns for the first and third group does simplify things nicely. The following expression does work: `matches($prose, concat('(^|\W)', $word, '($|\W)'),'i')`
Mads Hansen
OK, thanks. Will edit my answer :)
Tim Pietzcker
A: 

If your XSLT 2.0 processor is Saxon 9 then you can use Java regular expression syntax (including \b) with the functions matches, tokenize and replace by starting the flag attribute with an exclamation mark:

<xsl:value-of select="matches('all foo is bar', '\bfoo\b', '!i')"/>

Michael Kay mentioned that option recently on the XSL mailing list.

Martin Honnen
@Martin - have you gotten it to work? I see that Michael Kay said it is an "undocumented, largely untested, and completely non-conformant option". When I run it against Saxon 9.2.0.3 in oXygen 11 it throws an error: net.sf.saxon.trans.XPathException: Invalid character '!' in regular expression flags - Invalid character '!' in regular expression flagsStart location: 881:0URL: http://www.w3.org/TR/2005/WD-xpath-functions-20050211/#ERRFORX0002
Mads Hansen
Yes, it works fine for me with Saxon 9.2 HE, tested (now) with both 9.2.0.5 and 9.2.0.3, run from the command line with e.g. java -jar saxon9he.jar. I am not sure why you get that error, but that URL referring to a working draft (WD) version of the XPath functions might indicate you are running an older version of Saxon than you believe.
Martin Honnen