tags:

views:

19

answers:

2

I am trying to transform XHTML using an XSLT stylesheet, but I can't even get a basic stylesheet to match anything. I'm sure I'm missing something simple.

Here's my XHTML source document (no big surprises):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
<html xmlns="http://www.w3.org/1999/xhtml"&gt;
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 25 March 2009), see www.w3.org" />
...
</body>
</html>

The actual contents don't matter too much, as I'll demonstrate below. By the way, I'm pretty sure the document is well-formed since it was created via tidy -asxml.

My more complex XPath expressions were not returning any results, so as a sanity test, I'm trying to transform it very simply using the following stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
    <xsl:template match="/">
        <xsl:text>---[</xsl:text>
        <xsl:for-each select="html">
            <xsl:text>Found HTML element.</xsl:text>
        </xsl:for-each>
        <xsl:text>]---</xsl:text>
    </xsl:template>
</xsl:stylesheet>

The transform is done via xsltproc --nonet stylesheet.xsl input.html, and the output is: "---[]---" (i.e., it didn't find a child element of html). However, if I change the for-each section to:

<xsl:for-each select="*">
    <xsl:value-of select="name()"/>
</xsl:for-each>

Then I get "---[html]---". And similarly, if I use for-each select="*/*" I get "---[headbody]---" as I would expect.

Why can it find the child element via * (with name() giving the correct name) but it won't find it using the element name directly?

+2  A: 

The html element in your source XML defines a namespace. You have to include it in your match expression and reference it in your xsl:stylesheet element:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:html="http://www.w3.org/1999/xhtml"&gt;
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
    <xsl:template match="/">
        <xsl:text>---[</xsl:text>
        <xsl:for-each select="html:html">
            <xsl:text>Found HTML element.</xsl:text>
        </xsl:for-each>
        <xsl:text>]---</xsl:text>
    </xsl:template>
</xsl:stylesheet>
Frédéric Hamidi
If I do that, I get "XPath error : Undefined namespace prefix". Do I need to add something to the stylesheet to say how to resolve the namespace?
Tadmas
The HTML namespace will need to be declared in the stylesheet as well. Typically, on the document element of the stylesheet: `<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml">`, but it can also be declared on the for-each element: `<xsl:for-each select="html:html" xmlns:html="http://www.w3.org/1999/xhtml">`
Mads Hansen
Sorry, forgot the namespace reference in the `xsl:stylesheet` element. I updated my answer.
Frédéric Hamidi
Works beautifully. Wish I could omit typing that namespace, though, since more complex XPath expressions like html:html/html:body/html:div[@id='foo']/... is going to get really hard to read. Sigh... at least it works now. :)
Tadmas
+1  A: 

Change your stylesheet from:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt; 
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/> 
    <xsl:template match="/"> 
        <xsl:text>---[</xsl:text> 
        <xsl:for-each select="html"> 
            <xsl:text>Found HTML element.</xsl:text> 
        </xsl:for-each> 
        <xsl:text>]---</xsl:text> 
    </xsl:template> 
</xsl:stylesheet> 

to:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:x="http://www.w3.org/1999/xhtml"
> 
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/> 
    <xsl:template match="/"> 
        <xsl:text>---[</xsl:text> 
        <xsl:for-each select="x:html"> 
            <xsl:text>Found HTML element.</xsl:text> 
        </xsl:for-each> 
        <xsl:text>]---</xsl:text> 
    </xsl:template> 
</xsl:stylesheet> 

Explanation:

The XML document has declared a default namespace: "http://www.w3.org/1999/xhtml", and all unprefixed nodes that descend from the top element declaring this default namespace, belong to this namespace.

On the other side, in XPath any unprefixed name is considered to belong in "no namespace".

Therefore, the <xsl:for-each select="html"> instruction will select and apply its body to all html elements that belong to "no namespace" -- and there are none such in the document -- the only html element does belong to the xhtml namespace.

Solution:

The the names that belong to a default namespace cannot be referenced unprefixed. Therefore, we need to bind a prefix to the namespace such an element belongs to. If this prefix is "x:", then we can reference any such element prefixed with "x:".

Dimitre Novatchev