views:

61

answers:

2

The element tasklist may contain at most one title and at most one description, additionally any number (incl. 0) task elements in any order.

The naive approach is not applicable, since the order should not matter:

<!ELEMENT tasklist (title?, description?, task*) >

Alternatively, I could explicitly name all possible options:

(title, description?, task*) |
(title, task+, description?, task*) |
(task+, title, task*, description?, task*) |
(description, title?, task*) |
(description, task+, title?, task*) |
(task+, description, task*, title?, task*) |
(task*)

but then it's quite easy to write a non-deterministic rule, and furthermore it looks like the direct path to darkest madness. Any ideas, how this could be done more elegantly?

And no, an XSD or RelaxNG is no option. I need a plain, old DTD.

A: 

Why is the order unimportant?

It seems to me as if the order is rather a bit important right here; and that the only sensible order is title?, description?, task*.

Flexibility is nice and all, but sometimes it's just not neccessary.

Williham Totland
"Why is the order unimportant?" My problem is, that the DTD must match some existing XML, that is part human, part machine generated. XML files with `task*, description?, title?` work in the existing consuming application exactly like the ones the other way round, so the DTD should (must?) reflect this.
Boldewyn
Actually, it doesn't need to reflect this at all. The consuming application can very well be more lenient than the DTD; merely being updated to update files as appropriate when it encounters old files that work but are invalid. That way new files are created with a sensible structure but old files continue to work, and attempting to validate them will give an error, prompting whomsoever does the validation to revisit and repair the file.
Williham Totland
+3  A: 

This summarises what you need:

<!ELEMENT tasklist (task*, ((title?, task*, description?) |
                    (description?, task*, title?)), task*)>

Alternation for the title appearing before/after description.

However, this is not a deterministic content model, as @13ren explains in his answer. [Here is another example from Microsoft](http://msdn.microsoft.com/en-us/library/9bf3997x(VS.71).aspx).

In short

Your requirements is to have a non-deterministic model, and as such, there is no possible valid DTD for your scenario.

Alternatives

If you place a simple restriction that either task or description must be the last element if both task and description are provided, you can use this deterministic DTD declaration:

<!ELEMENT tasklist (
  task*,
  ((title, task*, description?) | 
  (description, task*, title?))?
)>

Examples:

<!-- Valid -->
<tasklist>
  <task></task>
  <task></task>
  <task></task>
  <title></title>
  <task></task>
  <description></description>
</tasklist>
<!-- Valid -->
<tasklist>
  <title></title>
  <task></task>
  <task></task>
  <task></task>
</tasklist>
<!-- Invalid
<tasklist>
  <task></task>
  <title></title>
  <task></task>
  <description></description>
  <task></task>
</tasklist>
-->

Or, possibly more naturally, enforce that a title or description element must be the first element, and both title and description elements must exist or be non-existent.

<!ELEMENT tasklist (
  ((title, task*, description) | 
  (description, task*, title))?,
  task*
)>

Examples:

<!-- Valid -->
<tasklist>
  <title></title>
  <task></task>
  <description></description>
  <task></task>
  <task></task>
</tasklist>
<!-- Invalid
<tasklist>
  <task></task>
  <title></title>
  <description></description>
  <task></task>
  <task></task>
</tasklist>

<tasklist>
  <title></title>
  <task></task>
  <task></task>
  <task></task>
</tasklist>
-->

Otherwise

Otherwise, you need to use RELAX NG, which allows for non-deterministic models.

sirhc
I get this error `validity error : Content model of tasklist is not determinist` with XML Starlet (based on libxml2).
Boldewyn
Hi there, I've made corrections to my answer. Please take a look. :)
sirhc