tags:

views:

557

answers:

4

I wrote a simple tool to generate a DBUnit XML dataset using queries that the user enters. I want to include each query entered in the XML as a comment, but the DBUnit API to generate the XML file doesn't support inserting the comment where I would like it (above the data it generates), so I am resorting to putting the comment with ALL queries either at the top or bottom.

So my question: is it valid XML to place it at either location? For example, above the XML Declaration:

<!-- Queries used: ... -->
<?xml version='1.0' encoding='UTF-8'?>
<dataset>
  ...
</dataset>

Or below the root node:

<?xml version='1.0' encoding='UTF-8'?>
<dataset>
  ...
</dataset>
<!-- Queries used: ... -->

I plan to initially try above the XML Declaration, but I have doubts on if that is valid XML, despite the claim from wikipedia:

Comments can be placed anywhere in the tree, including in the text if the content of the element is text or #PCDATA.

I plan to post back if this works, but it would be nice to know if it is an official XML standard.

UPDATE: See my response below for the result of my test.

+6  A: 

According to the XML specification, a well-formed XML document is:

document ::= prolog element Misc*

where prolog is

prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?

and

XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

which means that, if you want to have comments at the top, you cannot have an XML type declaration.

The specification agrees with Wikipedia on comments:

2.5 Comments

[Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor MAY, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string "--" (double-hyphen) MUST NOT occur within comments.] Parameter entity references MUST NOT be recognized within comments.

All of this together means that you can put comments anywhere that's not inside other markup, except that you cannot have an XML declaration if you lead with a comment.

However, while in theory theory agrees with practice, in practice it doesn't, so I'd be curious to see how your experiment works out.

Anonymouse
+1  A: 

The first example is not valid XML, the declaration has to be the first thing in a XML document.

But besides that, comments can go anywhere else.

Correcting your first example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Queries used: ... -->
<dataset>
</dataset>
Vinko Vrsalovic
+1  A: 

The processing instruction must be the very first thing in the XML content (see XML comment and processing instructions). The following should work:

<?xml version='1.0' encoding='UTF-8'?>
<!-- Queries used: ... -->
<dataset>
  ...
</dataset>
David Schlosnagle
A: 

Thanks for the answers everyone!

As it turns out, the comment ahead of the file seemed to work, but when I delved into the DBUnit source, it is because validation is turned off.

I did try a simple document load via:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File("/path/to/file"));

and this fails with an exception because the XML Declaration is not the first thing (as others indicated would be the case).

So, while DBUnit would work, I prefer to have valid XML, so I moved the comment to the end (since DBUnit generates the XML Declaration, it is not an option to place the comment below it, even though I would prefer that... at least not without modifying the XML after the fact, which would be more work than it is worth).

Mike Stone