views:

44

answers:

1

I have an xml/tei like

 <p> In trattoria scoprii che c'era <del rend="tratto a matita">anche</del> Mirella,
                non la non vedevo da almeno sei anni. 
                La spianata dava infatti l'impressione di fango secco, <del rend="matita">divorato
                    dalle rughe</del><add place="margine sinistro" rend="matita">attraversato da
                    lunghe ferite nere</add>. Lontano si vedeva una montagna di creta dello
                stesso colore della mota. </p>

I am using this stylesheet to remove whitespaces, both between elements and inside text nodes.

    <xsl:strip-space elements="*"/>

<xsl:template match="/">
    <xsl:apply-templates />
</xsl:template>

<xsl:template match="*">
    <xsl:copy>
        <xsl:for-each select="@*">
            <xsl:attribute name="{name()}">
                <xsl:value-of select="normalize-space()"/>
            </xsl:attribute>
        </xsl:for-each>
        <xsl:apply-templates/>
    </xsl:copy>
</xsl:template>
<xsl:template match="text()">
    <xsl:value-of select="normalize-space()"/>
</xsl:template>

All goes well exept for the fact that normalize-space() removes also leading and traling whitespaces, so I have some undesidered behaviour like

c'era<del rend="tratto a matita">anche</del>Mirella

I can't exclude mixed-mode content form the removing, because my first need is to collapse whitespaces like returns, tabs, identation INSIDE, say, the <p> element.

Is there a way/function/trick to collapse multiple whitespaces in a single whitespace whithout removing the leading and trailing whitespace?

+1  A: 

I don't think there is a built in function to do this easily, but (at least in XPath 2) there is a pretty complete regular expression language with a replace() function that you should be able to convince to do what you want. (With a more readable introduction at xml.com).

I think all you need to do is replace:

select="normalize-space()"

with

select="replace(., '(\s\s+)', ' ')"

but I've not tested this.

Edit: Fixed the first argument in replace, as noted by Mycol below.

Andrew Walker
Thank you. It Worked but with select="replace(., '(\s\s+)', ' ')"
Mycol
Excellent. I would be intrested to know what you are using to process the XSLT as I'm not 100% sure that this is fully portable.
Andrew Walker
Saxon 9.2 Home Edition
Mycol