tags:

views:

167

answers:

1

Hi I want to work around a 'bug' in certain RSS-feeds, which use an incorrect namespace for the mediaRSS module. I tried to do it by manipulating the DOM programmatically, but using XSLT seems more flexible to me.

Example:

<media:thumbnail xmlns:media="http://search.yahoo.com/mrss" url="http://www.suedkurier.de/storage/pic/dpa/infoline/brennpunkte/4311018_0_merkelxI_24280028_original.large-4-3-800-199-0-3131-2202.jpg" />
<media:thumbnail url="http://www.suedkurier.de/storage/pic/dpa/infoline/brennpunkte/4311018_0_merkelxI_24280028_original.large-4-3-800-199-0-3131-2202.jpg" />

Where the namespace must be http://search.yahoo.com/mrss/ (mind the slash).

This is my stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="//*[namespace-uri()='http://search.yahoo.com/mrss']"&gt;
        <xsl:element name="{local-name()}" namespace="http://search.yahoo.com/mrss/" >
            <xsl:apply-templates select="@*|*|text()" />
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

Unfortunately the result of the transformation is an invalid XML and my RSS-Parser (ROME Library) does not parse the feed anymore:

java.lang.IllegalStateException: Root element not set
    at org.jdom.Document.getRootElement(Document.java:218)
    at com.sun.syndication.io.impl.RSS090Parser.isMyType(RSS090Parser.java:58)
    at com.sun.syndication.io.impl.FeedParsers.getParserFor(FeedParsers.java:72)
    at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:273)
    at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:251)
    ... 8 more

What is wrong with my stylesheet?

+1  A: 

You have half of the solution in your stylesheet.

You have put in a template to match (and correct) the elements with the wrong Media RSS namespace, but you don't have anything to match the other elements/attributes in the RSS feed.

The built-in template rules are matching the rest of the document nodes, which will only copy the text nodes into the output. That does not preserve the original RSS feed's XML and produces output that is not valid RSS XML structure.

Adding an identity transform template will ensure that the other nodes and attributes get copied into the output and will preserve the document content/structure.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <!--identity transform that will copy matched node/attribute to the output and apply templates for it's children and attached attributes-->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="@*|*|text()" />
        </xsl:copy>
    </xsl:template>

    <!--Specialized template to match on elements with the incorrect namespace and generate a new element-->
    <xsl:template match="//*[namespace-uri()='http://search.yahoo.com/mrss']"&gt;
        <xsl:element name="{local-name()}" namespace="http://search.yahoo.com/mrss/" >
            <xsl:apply-templates select="@*|*|text()" />
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>
Mads Hansen
OK. I definitely need to invest more time into XSL/XPATH etc!
er4z0r