tags:

views:

70

answers:

2

I'm using the following python code to search for a node in an XML file and changing the value of an attribute of one of it's children.Changes are happening correctly when the node is displayed using toxml().But, when it is written to a file, the attributes rearrange themselves(as seen in the Source and the Final XML below). Could anyone explain how and why this happen? Python code:

#!/usr/bin/env python
import xml
from xml.dom.minidom import parse
dom=parse("max.xml")

#print "Please enter the store name:"
for sku in dom.getElementsByTagName("node"):
    if sku.getAttribute("name") == "store":
        sku.childNodes[1].childNodes[5].setAttribute("value","Delhi,India")
        print sku.toxml()
xml.dom.ext.PrettyPrint(dom, open("new.xml", "w"))

a part of the Source XML:

<node name='store' node_id='515' module='mpx.lib.node.simple_value.SimpleValue'  config_builder=''  inherant='false' description='Configurable Value'>
          <match>
            <property name='1' value='point'/>
            <property name='2' value='0'/>
            <property name='val' value='Store# 09204 Staten Island, NY'/>
            <property name='3' value='str'/>
          </match>
        </node>

Final XML :

<node config_builder="" description="Configurable Value" inherant="false" module="mpx.lib.node.simple_value.SimpleValue" name="store" node_id="515">
              <match>
                <property name="1" value="point"/>
                <property name="2" value="0"/>
                <property name="val" value="Delhi,India"/>
                <property name="3" value="str"/>
              </match>
            </node>
+1  A: 

There is no guarantee on the ordering of sub-elements or attibutes in the XML spec. you should NOT rely on ordering of attributes or sub-elements in your business logic, it is guaranteed to not work as expected with all the various parsers. As a side note, I think ElementTree is a much better way to manipulate the DOM than minidom, especially if you are using 2.5.x or newer it is built-in.

fuzzy lollipop
ElementTree is better coz of the above stated reasons, or do you have any other?
fixxxer
it is a much cleaner easy to use API, no parser should guarantee order, it is implementation dependent and you should never depend on order of attributes or sub-elements.
fuzzy lollipop
Whoa. Attributes can be shuffled without changing the meaning of the document. Not so with sub-elements. Suppose I have a <body> containing a few <p>s. Am I to expect them to be arbitrarily rearranged by the DOM? No. I can and do expect the DOM to preserve the meaning of my document and the order of all sub-elements of the root. Note how sub-elements are implemented as a NodeList, which preserves order, not a NamedNodeMap.
Amos Newcombe
@Amos - I like the way you think. :)@fuzzyLollipop - what do we do, in the case Amos pointed out?
fixxxer
HTML is not XML <p> is not a valid XML element, in the example above the <property> elements can be in any order when the XML is read in and output in any order it should not matter.
fuzzy lollipop
<p> is certainly a valid XHTML element, and XHTML is XML. And order of subelements is preserved by the DOM, which uses a NodeList to hold them. If your document is storing data, then maybe order does not matter (although the DTD should have something to say about that), but there's more to XML than that. @fixxxer, What do we do? There's nothing we need to do, it all works as is. Although "working" still means attributes can be shuffled arbitrarily; it's only sub-elements that I am making this point about.
Amos Newcombe
actually <p> is NOT valid XML without a closing tag such </p> is required for writing XHTML. http://webdesign.about.com/od/htmltags/a/aabg092299a.htm the W3C Validator shows <p/> as valid as well but as your assertion is wrong <p> is NOT valid XHTML.
fuzzy lollipop
Not by itself, no. But a paragraph tag with contents and a closing tag is valid XHTML. And if you have several of them in your document, their order will be preserved by the DOM. And that's a good thing.
Amos Newcombe
you are still confusing DOM with XML parsing. Just because XHTML is XML doesn't mean that XML parsers are implemented like the DOM. The DOM is a very specific implementation that may or may not be doing XML parsing in the case of XHTML. The DOM has its own specificatoin, which has __nothing__ to do with XML. You claim that <p> is valid XML, it isn't. The DOM is __not__ an XML parser, it is an XHTML parser that has different behavior than XML specification parsers. Stop trying to argue about something you are confused about. DOM != XML parser.
fuzzy lollipop
__Lawyer among the specifications:__ The first thing to be aware of, and which might surprise you, is that the XML 1.0 specification itself does not guarantee element order in the sections on well-formedness (the sections on validity are more relevant to the discussion later in this article). The XML 1.0 well-formedness definition specifically states that attributes are unordered, but says nothing about elements. This means that technically speaking, a conforming XML parser might decide to report the child elements in any order.
fuzzy lollipop
OK, but the original question was about DOM implementations, which are supposed to meet everybody's needs, including those of us for which order is important. And in fact DOM child nodes are specified as members of a NodeList (http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-1451460987), and in practice are never ever rearranged, even if doing so would still produce conformant XML. And this is a good and necessary thing for some of us.
Amos Newcombe
+2  A: 

Per XML's standards for the DOM, attributes are not held as an ordered collection; in Python's xml.dom implementations, they're a NamedNodeMap, whose docs say:

The order you get the attributes in is arbitrary but will be consistent for the life of a DOM

In particular, there's no promise that this arbitrary order will be the same as the (semantically irrelevant) order found in the XML source that was parsed to build this DOM.

Alex Martelli