tags:

views:

53

answers:

2

I have some XML with <ListItem> elements, and I'd like to wrap any consecutive runs with <List> elements. So, source XML would look something like this:

<Section>
  <Head>Heading</Head>
  <Para>Blah</Para>
  <ListItem>item 1</ListItem>
  <ListItem>item 2</ListItem>
  <ListItem>item 3</ListItem>
  <ListItem>item 4</ListItem>
  <Para>Something else</Para>
</Section>

And I'd want to convert it to something like this:

<Section>
  <Head>Heading</Head>
  <Para>Blah</Para>
  <List>
    <ListItem>item 1</ListItem>
    <ListItem>item 2</ListItem>
    <ListItem>item 3</ListItem>
    <ListItem>item 4</ListItem>
  </List>
  <Para>Something else</Para>
</Section>

using XSLT. I'm sure it's obvious but I can't work it out at this time in the evening. Thanks!


Edit: this can be safely ignored by most people.

This XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root>
  <Story>
    <Section id="preface">
      <ChapterTitle>Redacted</ChapterTitle>
      <HeadA>Redacted</HeadA>
      <Body>Redacted</Body>
      <BulletListItem>Item1</BulletListItem>
      <BulletListItem>Item2</BulletListItem>
      <BulletListItem>Item3</BulletListItem>
      <BulletListItem>Item4</BulletListItem>
      <HeadA>Redacted</HeadA>
      <Body>Redacted</Body>
      <HeadA>Redacted</HeadA>
      <Body>Redacted</Body>
      <Body>Redacted<Italic>REDACTED</Italic>Redacted</Body>
      <BulletListItem>Second list Item1</BulletListItem>
      <BulletListItem>Second list Item2</BulletListItem>
      <BulletListItem>Second list Item3</BulletListItem>
      <BulletListItem>Second list Item4</BulletListItem>
      <Body>Redacted</Body>
    </Section>
  </Story>
</Root>

With this XSL:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kFollowing" match="BulletListItem[preceding-sibling::*[1][self::BulletListItem]]"
  use="generate-id(preceding-sibling::BulletListItem
         [not(preceding-sibling::*[1][self::BulletListItem])])"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="BulletListItem
         [not(preceding-sibling::*[1][self::BulletListItem])]">
  <BulletList>
    <xsl:call-template name="identity"/>
    <xsl:apply-templates mode="copy" select="key('kFollowing', generate-id())"/>
  </BulletList>
 </xsl:template>

 <xsl:template match="BulletListItem[preceding-sibling::*[1][self::BulletListItem]]"/>

 <xsl:template match="BulletListItem" mode="copy">
  <xsl:call-template name="identity"/>
 </xsl:template>
</xsl:stylesheet>

When processed with Ruby REXML and XML/XSLT produces this XML (output prettyprint):

<Root>
  <Story>
    <Section id='preface'>
      <ChapterTitle>
        Redacted
      </ChapterTitle>
      <HeadA>
        Redacted
      </HeadA>
      <Body>
        Redacted
      </Body>
      <BulletList>
        <BulletListItem>
          Item1
        </BulletListItem>
        <BulletListItem>
          Item2
        </BulletListItem>
        <BulletListItem>
          Item3
        </BulletListItem>
        <BulletListItem>
          Item4
        </BulletListItem>
        <BulletListItem>
          Second list Item2
        </BulletListItem>
        <BulletListItem>
          Second list Item3
        </BulletListItem>
        <BulletListItem>
          Second list Item4
        </BulletListItem>
      </BulletList>
      <HeadA>
        Redacted
      </HeadA>
      <Body>
        Redacted
      </Body>
      <HeadA>
        Redacted
      </HeadA>
      <Body>
        Redacted
      </Body>
      <Body>
        Redacted
        <Italic>
          REDACTED
        </Italic>
        Redacted
      </Body>
      <BulletList>
        <BulletListItem>
          Second list Item1
        </BulletListItem>
      </BulletList>
      <Body>
        Redacted
      </Body>
    </Section>
  </Story>
</Root>

You'll see that the two lists get jammed together and the bit in between gets lost. Not sure if this is a bug in the Ruby libraries or in your XSLT.

+2  A: 

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:strip-space elements="*"/>
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()[1]"/>
        </xsl:copy>
        <xsl:apply-templates select="following-sibling::node()[1]"/>
    </xsl:template>
    <xsl:template match="ListItem">
        <List>
            <xsl:call-template name="ListItem"/>
        </List>
        <xsl:apply-templates select="following-sibling::node()
                                      [not(self::ListItem)][1]"/>
    </xsl:template>
    <xsl:template match="ListItem[preceding-sibling::node()[1]
                                              /self::ListItem]"
                  name="ListItem">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()[1]"/>
        </xsl:copy>
        <xsl:apply-templates select="following-sibling::node()[1]
                                                 /self::ListItem"/>
    </xsl:template>
</xsl:stylesheet>

Output:

<Section>
    <Head>Heading</Head>
    <Para>Blah</Para>
    <List>
        <ListItem>item 1</ListItem>
        <ListItem>item 2</ListItem>
        <ListItem>item 3</ListItem>
        <ListItem>item 4</ListItem>
    </List>
    <Para>Something else</Para>
</Section>

Edit 3: Ussing strip-space for what it is.

Alejandro
Sorry I couldn't pick both!
Skilldrick
@Alejandro - strange... When I run this stylesheet on the given input, I get something very different. In particular, your `select="following-sibling::node()[1][self::ListItem]"` misses the following ListItem because the first following sibling node is a whitespace text node. Yet I see no reason (http://www.w3.org/TR/xslt#strip) why that node was not stripped. I'm confused?!
LarsH
@Alejandro: PS this happened in Saxon 6.5.5 in Oxygen. There are no xml:space="preserve" attributes anywhere.
LarsH
@Alejandro: Good answer as always, +1.
Dimitre Novatchev
@Skilldrick: No problem. I'm just glad this could help you.
Alejandro
@LasrH: That's the way Saxon deals with white space only text nodes. I'll edit this for you.
Alejandro
@Alejandro - I've accepted yours now because there's some kind of strange bug with the other one, and yours works great.
Skilldrick
@Skilldrick: That may be a problem with the key. I've built this to wrap only adjacent elements imagining that you would probably have more than one list.
Alejandro
@Alejandro Thanks. Dimitre's now updated his version, and it works now.
Skilldrick
@Skilldrick: I'm sure. You should understand both technics: the fine grained traversal identity and the full recursive identity with modes.
Alejandro
+3  A: 

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kFollowing" match="ListItem[preceding-sibling::*[1][self::ListItem]]"
  use="generate-id(preceding-sibling::ListItem
         [not(preceding-sibling::*[1][self::ListItem])][1])"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="ListItem
         [not(preceding-sibling::*[1][self::ListItem])]">
  <List>
    <xsl:call-template name="identity"/>
    <xsl:apply-templates mode="copy" select="key('kFollowing', generate-id())"/>
  </List>
 </xsl:template>

 <xsl:template match="ListItem[preceding-sibling::*[1][self::ListItem]]"/>

 <xsl:template match="ListItem" mode="copy">
  <xsl:call-template name="identity"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<Section>
  <Head>Heading</Head>
  <Para>Blah</Para>
  <ListItem>item 1</ListItem>
  <ListItem>item 2</ListItem>
  <ListItem>item 3</ListItem>
  <ListItem>item 4</ListItem>
  <Para>Something else</Para>
</Section>

produces the wanted result:

<Section>
    <Head>Heading</Head>
    <Para>Blah</Para>
    <List>
        <ListItem>item 1</ListItem>
        <ListItem>item 2</ListItem>
        <ListItem>item 3</ListItem>
        <ListItem>item 4</ListItem>
    </List>
    <Para>Something else</Para>
</Section>
Dimitre Novatchev
@Dimitre: +1 Good answer. I was going to add this type of solution when I saw your.
Alejandro
Thanks, that worked perfectly!
Skilldrick
@Dimitre - I've discovered a bug with this. I've reproduced the case in my question.
Skilldrick
@Skilldrick: I don't see any change to your question and my transformation produces exactly the required result. ???
Dimitre Novatchev
@Dimitre - Sorry, I've been updating my question. Try it out and see if you have the same problem.
Skilldrick
@Skilldrick: Yes, there was a minor problem. I edited my solution and added 4 characters to the `<xsl:key>`. It works with your new example -- please, try it.
Dimitre Novatchev
@Dimitre - that did it, thanks! I'm going to leave Alejandro's accepted, as he needs the rep more :) It's a bit of an arbitrary decision anyway.
Skilldrick