views:

70

answers:

4

my PYTHON xml parser fails if there´s a comment at the beginnging of an xml file like::

<?xml version="1.0" encoding="utf-8"?>
<!-- Script version: "1"-->
<!-- Date: "07052010"-->
<component name="abc">
<pp>
    ....
</pp>
</component>

is it illegal to place a comment like this?

EDIT:

well it´s not throwing an error but the DOM module will fail and not recognize the child nodes:

import xml.dom.minidom as dom
sub_tree = dom.parse('xyz.xml')
for component in sub_tree.firstChild.childNodes:
    print(component)

I cannot acces the child nodes; sub_tree.firstChild.childNodes returns an empty list,but if I remove those 2 comments I can loop through the list and read the childnodes as usual!

EDIT:

Guys, this simple example is working and enough to figure it out. start your python shell and execute this small code above. Once it will output nothing and after deleting the comments it will show up the node!

A: 

That should be legal as long as the XML declaration is on the first line.

esalaka
+1  A: 

It is legal; from XML 1.0 Reference:

2.5 Comments

[Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor MAY, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string " -- " (double-hyphen) MUST NOT occur within comments.] Parameter entity references MUST NOT be recognized within comments.

systempuntoout
+1  A: 

To get better answers, show us (a) a small complete Python script and (b) a small complete XML document that together demonstrate the unexpected behaviour.

Have you considered using ElementTree?

John Machin
+1  A: 

If you do this:

import xml.dom.minidom as dom
sub_tree = dom.parse('xyz.xml')
print sub_tree.children

You will see what is your problem:

>>> print sub_tree.childNodes
[<DOM Comment node " Script ve...">, <DOM Comment node " Date: "07...">, <DOM Element: component at 0x7fecf88c>]

firstChild will obviously pick up the first child, which is a comment and doesn't have any children of its own. You could iterate over the children and skip all comment nodes.

Or you could ditch the DOM model and use ElementTree, which is so much nicer to work with. :)

Mattias Nilsson
thx this is what I was looking for! problem solved
wanderameise
antoher question: there are 3 nodes: 2 comments and one element node.but where´s the root element? valid xml files are allowed to have only ONE root element! Or does the parser treat comments differently?I think there must be 1 parent element!
wanderameise