tags:

views:

42

answers:

1

Hello,

I have some xml documents (similar to docbook) that have to be transformed to xsl-fo. Some of the documents contains poems, and the lines of the poems are written in separate p tags. The verses are separated by br tags. There are "page" tags that are irrelevant and should be ignored.

Typical code example:

<h4>Headline</h4>
<p>1st line of 1st verse</p>
<p>2nd line of 1st verse</p>
<br/>
<p>1st line of 2nd verse</p>
<p>2nd line of 2nd verse</p>
<page n="100"/>
<p>3rd line of 2nd verse</p>
<h4>Other headline</h4>

For the xsl-fo output, I would like to gather all the text of a verse into one single fo:block. Right now the mechanism works for code structures as above, but there are some exceptions. The actual way of doing it is to decide for every p tag: - Am I the first line of a verse? - If yes: collect all the text of this verse ynd write it into a fo:block, use the attributes of the actual (first) p tag to set the formatting of the block - If no: contents were treated ealrier, do nothing.

A first line is a p tag that is immediately preceded by a h4 or a br tag (or a page tag which itself is immediately preceded by a br tag). That one was easy to develop.

Collecting the text of a verse was easy for the given example: Group all following siblings, defining the groups ends by h4 or br tags, then I take the first group and use all p tags (ignore in between page tags or the ending h4 or br tag).

In code:

<xsl:for-each-group select="following-sibling::*" group-ending-with="br|h4">
    <xsl:if test="position()=1">
        <xsl:for-each select="current-group()[not(self::h4) and not(self::br) and not(self::page)]">
            <xsl:apply-templates/>&crt;
        </xsl:for-each>
    </xsl:if>
</xsl:for-each-group>

Now to a similar code example:

<h4>Headline</h4>
<p class="center">1</p>
<p>1st line of 1st verse</p>
<p>2nd line of 1st verse</p>
<br/>
<p class="center">2</p>
<p>1st line of 2nd verse</p>
<p>2nd line of 2nd verse</p>
<page n="100"/>
<p>3rd line of 2nd verse</p>
<h4>Other headline</h4>

Now the centered p are like a subheadlines to the following verses. It is not really a verse, but for my purposes it would be enough if it would be separated from the real verse's text. Thus the slightly varied rule for getting all the text of the current verse is: Group all following siblings, defining the groups ends by h4 or br tags or by a p tag that has another class then the current p tag , then I take the first group and use all p tags (ignore in between page tags or the ending h4 or br tag).

Therefore I stored the value of the class attribute of the current p tag in a variable called attributes and defined the the group rule as:

<xsl:for-each-group select="following-sibling::*" group-ending-with="br|h4|p[normalize-space(@class) != $attributes]">

In eturn, when trying to determine if a p tag is the first line of a verse, it cannot only be preceded by a h4 or br, but also by another p tag that has a different class attribute value.

Now this works fine in my testing environment in Oxygen using Saxon-B9.1.0.6. But the transformation has to be performed in java using Saxon9.jar, and there the usage of a variable inside the group-ending-with attribute of the xsl:for-each-group causes an exception.

And now I am kind of stuck.

COuld the grouping conditions be defined in a better way? Or should this maybe not be done with grouping at all, but with a totally different approach?

The source files are as they are, the tagging might not be optimal, but it is as it is. The transformation is not new but was subsequently adapted to our needs. Source code with poems in it was simply avoided earlier, but I'd like to find a solution for this.

Any help would be greatly appreciated.

Best regards,

Christian Kirchhoff

A: 

This stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="div[@class='poem']">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:for-each-group select="*" group-ending-with="br|h4">
                <div class="strophe">
                    <xsl:copy-of select="current-group()/self::p[not(@class)]"/>
                </div>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

With this input:

<div class="poem">
    <h4>Headline</h4>
    <p>1st line of 1st verse</p>
    <p>2nd line of 1st verse</p>
    <br/>
    <p>1st line of 2nd verse</p>
    <p>2nd line of 2nd verse</p>
    <page n="100"/>
    <p>3rd line of 2nd verse</p>
</div>

Output:

<div class="poem">
    <div class="strophe">
        <p>1st line of 1st verse</p>
        <p>2nd line of 1st verse</p>
    </div>
    <div class="strophe">
        <p>1st line of 2nd verse</p>
        <p>2nd line of 2nd verse</p>
        <p>3rd line of 2nd verse</p>
    </div>
</div>

With this input:

<div class="poem">
    <h4>Headline</h4>
    <p class="center">1</p>
    <p>1st line of 1st verse</p>
    <p>2nd line of 1st verse</p>
    <br/>
    <p class="center">2</p>
    <p>1st line of 2nd verse</p>
    <p>2nd line of 2nd verse</p>
    <page n="100"/>
    <p>3rd line of 2nd verse</p>
</div>

Output:

<div class="poem">
    <div class="strophe">
        <p>1st line of 1st verse</p>
        <p>2nd line of 1st verse</p>
    </div>
    <div class="strophe">
        <p>1st line of 2nd verse</p>
        <p>2nd line of 2nd verse</p>
        <p>3rd line of 2nd verse</p>
    </div>
</div>

So, this stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="div[@class='poems']">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:for-each-group select="*[preceding-sibling::h4]"
                                group-starting-with="h4">
                <div class="poem">
                    <xsl:for-each-group select="current-group()"
                                        group-ending-with="br">
                        <div class="strophe">
                            <xsl:copy-of select="current-group()
                                                  /self::p[not(@class)]"/>
                        </div>
                    </xsl:for-each-group>
                </div>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

With this input:

<div class="poems">
    <h3>Poems</h3>
    <h4>Headline</h4>
    <p>1st line of 1st verse</p>
    <p>2nd line of 1st verse</p>
    <br/>
    <p>1st line of 2nd verse</p>
    <p>2nd line of 2nd verse</p>
    <page n="100"/>
    <p>3rd line of 2nd verse</p>
    <h4>Headline</h4>
    <p class="center">1</p>
    <p>1st line of 1st verse</p>
    <p>2nd line of 1st verse</p>
    <br/>
    <p class="center">2</p>
    <p>1st line of 2nd verse</p>
    <p>2nd line of 2nd verse</p>
    <page n="100"/>
    <p>3rd line of 2nd verse</p>
</div>

Output:

<div class="poems">
    <div class="poem">
        <div class="strophe">
            <p>1st line of 1st verse</p>
            <p>2nd line of 1st verse</p>
        </div>
        <div class="strophe">
            <p>1st line of 2nd verse</p>
            <p>2nd line of 2nd verse</p>
            <p>3rd line of 2nd verse</p>
        </div>
    </div>
    <div class="poem">
        <div class="strophe">
            <p>1st line of 1st verse</p>
            <p>2nd line of 1st verse</p>
        </div>
        <div class="strophe">
            <p>1st line of 2nd verse</p>
            <p>2nd line of 2nd verse</p>
            <p>3rd line of 2nd verse</p>
        </div>
    </div>
</div>
Alejandro