views:

936

answers:

2

Hi,

We are using dom4j 1.6.1, to parse XML comming from somewhere. Sometime, the balise have mention of the namespace ( eg : ) and sometime not ( ). And it's make call of Element.selectSingleNode(String s ) fails.

For now we have 3 solutions, and we are not happy with them

1 - Remove all namespace occurence before doing anything with the xml document

xml = xml .replaceAll("xmlns=\"[^\"]*\"","");
xml = xml .replaceAll("ds:","");
xml = xml .replaceAll("etm:","");
[...] // and so on for each kind of namespace

2 - Remove namespace just before getting a node By calling

Element.remove(Namespace ns)

But it's works only for a node and the first level of child

3 - Clutter the code by

node = rootElement.selectSingleNode(NameWithoutNameSpace)
if ( node == null )
    node = rootElement.selectSingleNode(NameWithNameSpace)

So ... what do you think ? Witch one is the less worse ? Have you other solution to propose ?

+1  A: 

Option 1 is dangerous because you can't guarantee the prefixes for a given namespace without pre-parsing the document, and because you can end up with namespace collision. If you're consuming a document and not outputting anything, it might be ok, depending on the source of the doc, but otherwise it just loses too much information.

Option 2 could be applied recursively but its got many of the same problems as option 1.

Option 3 sounds like the best approach, but rather than clutter your code, make a static method that does both checks rather than putting the same if statement throughout your codebase.

The best approach is to get whoever is sending you the bad XML to fix it. Of course this begs the question is it actually broken. Specifically, are you getting XML where the default namespace is defined as X and then a namespace also representing X is given a prefix of 'es'? If this is the case then the XML is well formed and you just need code that is agnostic about the prefix, but still uses a qualified name to fetch the element. I'm not familiar enough with Dom4j to know if creating a Namespace with a null prefix will cause it to match all elements with a matching URI or only those with no prefix, but its worth experimenting with.

Jherico
I will try and dig the doc about namespace with null prefix. Thanks anyway. About the source of the XML file : theire is not way that they change anything. But the file with or without namespace are valid.With the files, we build objects, that we use in our system. But we never "writte" something. ( we dont have right to modify the xml file )
Antoine Claval
A: 

Following is some code that i had found and now use. Might be useful, if looking for a generic way, to remove all namespaces from a dom4j document.

    public static void removeAllNamespaces(Document doc) {
        Element root = doc.getRootElement();
        if (root.getNamespace() !=
                Namespace.NO_NAMESPACE) {            
                removeNamespaces(root.content());
        }
    }

    public static void unfixNamespaces(Document doc, Namespace original) {
        Element root = doc.getRootElement();
        if (original != null) {
            setNamespaces(root.content(), original);
        }
    }

    public static void setNamespace(Element elem, Namespace ns) {

        elem.setQName(QName.get(elem.getName(), ns,
                elem.getQualifiedName()));
    }

    /**
     *Recursively removes the namespace of the element and all its
    children: sets to Namespace.NO_NAMESPACE
     */
    public static void removeNamespaces(Element elem) {
        setNamespaces(elem, Namespace.NO_NAMESPACE);
    }

    /**
     *Recursively removes the namespace of the list and all its
    children: sets to Namespace.NO_NAMESPACE
     */
    public static void removeNamespaces(List l) {
        setNamespaces(l, Namespace.NO_NAMESPACE);
    }

    /**
     *Recursively sets the namespace of the element and all its children.
     */
    public static void setNamespaces(Element elem, Namespace ns) {
        setNamespace(elem, ns);
        setNamespaces(elem.content(), ns);
    }

    /**
     *Recursively sets the namespace of the List and all children if the
    current namespace is match
     */
    public static void setNamespaces(List l, Namespace ns) {
        Node n = null;
        for (int i = 0; i < l.size(); i++) {
            n = (Node) l.get(i);

            if (n.getNodeType() == Node.ATTRIBUTE_NODE) {
                ((Attribute) n).setNamespace(ns);
            }
            if (n.getNodeType() == Node.ELEMENT_NODE) {
                setNamespaces((Element) n, ns);
            }            
        }
    }

Hope this is useful for someone who needs it!

Abhishek
couldn't make this code work. I used xml with namespaces sample from w3schools, but it's like dom4j doesn't recognize the namespaces. The first if (root.getNamespace() != Namespace.NO_NAMESPACE) evaluates to true, and even if I remove the if, it still does nothing.
Dan