tags:

views:

54

answers:

2

Hi there, I am fairly new to xslt (2.0) and am having some trouble with a tricky issue. Essentially I have a badly formatted html file like below:

    <html>
    <body>

    <p> text 1 </p>
    <div> <p> text 2</p> </div>
    <p> Here is a list
        <ul>
            <ol> 
                <li> ListItem1 </li>
            <li> ListItem1 </li>
        </ol>
        <dl>
            <li> dl item </li>
            <li> dl item2 </li>
        </dl>
    </ul> 
    <div>
    <p> I was here</p>
    </div>
    </p>
</body>
</html>

And I am trying to put it into a nicely formated XML file. In my xslt file I recursively check if all children of a p or div are other p's or div's and just promote them, other wise I use them as stand alone paragraphs. I extended this idea so that if a p or div with a child list show up properly but don't promote the list children.

A problem that I am having is that the output xml I get is the following

    <?xml version="1.0" encoding="utf-8"?><html>
    <body>

    <p> text 1 </p>
     <p> text 2</p> 
     Here is a list
    <ul>
        <ol> 
            <li> ListItem1 </li>
            <li> ListItem1 </li>
        </ol>
        <dl>
            <li> dl item </li>
            <li> dl item2 </li>
        </dl>
    </ul> 

    <p> I was here</p>



</body>
</html>

"Here is a list" needs to be in paragraph tags too! I am going crazy trying to solve this ... Any input/links would be greatly appreciated.

A: 

You could first check that a <p> has a finishing tag </p>. If it doesn't then you take all text you find until you reach a new tag, that is a <p>, <div>, <li> or anything like that and simply copy it to your xml file where you have addad a full <p></p> strukture.

This is how I would do it, might not be the best way but it will work.

Marthin
+1  A: 

This transformation:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "div[descendant::div or descendant::p]
  |
   p[descendant::div or descendant::p]
  ">
   <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match=
  "div[descendant::div or descendant::p]/text()
  |
   p[descendant::div or descendant::p]/text()
  ">
   <xsl:element name="{name(..)}"
        namespace="{namespace-uri(..)}">
     <xsl:copy-of select="."/>
   </xsl:element>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document, produces the wanted, correct output:

<html>
   <body>
      <p> text 1 </p>
      <p> text 2</p>
      <p> Here is a list

      </p>
      <ul>
         <ol>
            <li> ListItem1 </li>
            <li> ListItem1 </li>
         </ol>
         <dl>
            <li> dl item </li>
            <li> dl item2 </li>
         </dl>
      </ul>
      <p> I was here</p>
   </body>
</html>
Dimitre Novatchev
Thanks very much for your help! It is extremely difficult to find people who know xslt. Cheers
yunje