views:

202

answers:

2

Hi all,

I am trying to detetct/work around this bug in RSS elements. That means I have to find a wrong namespace-declaration and change its value to the correct namespace. E.g:

xmlns:media="http://search.yahoo.com/mrss" 

must be:

xmlns:media="http://search.yahoo.com/mrss/" 

How can I achive that given a org.w3c.Document?

I meanwile found out how to get all elements of a certain namespace:

        XPathFactory xpf = XPathFactory.newInstance();
        XPath xpath = xpf.newXPath();
        XPathExpression expr = xpath.compile("//*[namespace-uri()='http://search.yahoo.com/mrss']");


        Object result = expr.evaluate(d, XPathConstants.NODESET);
        if (result != null) {
            NodeList nodes = (NodeList) result;
            for(int node=0;node<nodes.getLength();node++)
            {
                Node n = nodes.item(node);
                this.log.warn("Found old mediaRSS namespace declaration: "+n.getTextContent());
            }

        } 

So now I have to figure out how to change the namespace of a Node via JAXP.

A: 

You could probably do it with XSLT, with a rule like this:

<xsl:template match="media:*">
   <xsl:element name="local-name()" namespace="http://search.yahoo.com/mrss/"&gt;
      <xsl:apply-templates match="node()|@*"/>
   </xsl:element>
</xsl:template>

where media is bound to "http://search.yahoo.com/mrss".

You may have to tweak the syntax a little, as I'm writing this without the help of a compiler. Also, what you'll get is probably not extremely nicely formatted (namespace declarations on many elements), but it should be locically correct.

Chris Lercher
Thanks for you reply. However I am accessing the document on the object level. I am also not sure whether the local prefix will always be "media:". After all this are RSS-Feeds made by other people. God knows what prefix they use :-/
er4z0r
They don't have to! You can use any prefix in the XSLT (e.g. "x:*"), all that matters is the namespace. (In other words, the prefix you use in XSLT doesn't have anything to do with the prefix in the XML file.)
Chris Lercher
@er4z0r - the namespace prefix that you declare in your XSLT (i.e. media) does not have to match the namespace prefix in the source document. As long as they both refer to the same URI, the template will match.
Mads Hansen
Just to see, if I got you right. Your XSLT would look for all elements that are prefixed with the prefix representing the "wrong" namespace" and then set the namespace of these directly to the correct namespace?
er4z0r
Yes, that's right. You could also try to match "xmlns" attributes, and change them (to get a nicer XML, if you care). But you'll have to change the elements anyway in addition to that.
Chris Lercher
BTW, there may be more performant ways than XSLT, if you're already starting with a DOM document - but you didn't ask that in your original version of the question, so I answered based on the assumption that you were working on XML files.
Chris Lercher
Why whould I have to change the elements anyway after changing the xmlns attribute?
er4z0r
Let's say, your input is: <a xmlns="example"><b/></a> Now if you change the xmlns attribute to "other", you'll end up with: <a xmlns="other"><b xmlns="example"/></a>. This is, because every element node is associated with a namespace, and the element b is still in namespace "example". BTW, not all XML processors will even allow changing the xmlns attribute, but if they do, you will still have to change the element namespaces, because the recursive effect of xmlns attributes is mainly a cosmetic thing, to make human readable XML files look cleaner.
Chris Lercher
O.K. Maybe I really should have a look into XSLT and try that.
er4z0r
@chris_I: Sorry, I am trying to solve this via the DOM. Can you tell me what the equivalent XPATH-Selector would be for your code above?
er4z0r
A: 

Just for the sake of completeness:

Java Code:

Document d = out.outputW3CDom(converted);
            DOMSource oldDocument = new DOMSource(d);
            DOMResult newDocument = new DOMResult();
            TransformerFactory tf = TransformerFactory.newInstance();
            StreamSource xsltsource = new StreamSource(
                    getStream(MEDIA_RSS_TRANSFORM_XSL));
            Transformer transformer = tf.newTransformer(xsltsource);
            transformer.transform(oldDocument, newDocument);

private InputStream getStream(String fileName) {
    InputStream xslStream = Thread.currentThread().getContextClassLoader()
                .getResourceAsStream("/" + fileName);
    if (xslStream == null) {
        xslStream = Thread.currentThread().getContextClassLoader()      .getResourceAsStream(fileName);
        }
        return xslStream;
    }

Stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <!--identity transform that will copy matched node/attribute to the output and apply templates for it's children and attached attributes-->
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="@*|*|text()" />
        </xsl:copy>
    </xsl:template>

    <!--Specialized template to match on elements with the incorrect namespace and generate a new element-->
    <xsl:template match="//*[namespace-uri()='http://search.yahoo.com/mrss']"&gt;
        <xsl:element name="{local-name()}" namespace="http://search.yahoo.com/mrss/" >
            <xsl:apply-templates select="@*|*|text()" />
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

Special thanks to Mads Hansen for his help with the XSLT.

er4z0r