views:

51

answers:

1

I'm working with OSIS (Open Scriptural Information Standard), an XML schema for describing scripture and related text. When I first looked at a sample of the XML I noticed some oddities that I have not seen in XML before. Mainly tags being closed followed by content that would logically belong inside the closed tag. After looking through the documentation, I found that they call this type of markup "Milestones."

In this instance it is being used because a quote can begin in one verse and span several verses before being closed. It seems like a hack and I am going to have to do some coding to parse and search through, and display sections of the XML for the web. While I understand that this is technically valid XML, it can't be verified [easily] against the schema for correctness and standard XML parsing APIs will not be able to grab elements between milestones. I believe there are better ways that this "standard" could have been formed. What are your thoughts on this type of markup? I haven't really found any other references to this practice, where else is it used. Is it valid?

From the documentation...

In XML the normal form of an element is a start tag and an end tag: <q>...</q>. For handling markup that crosses boundaries, however, a special form must be used. It consists of two totally empty instances of the same element type: one to mark the starting point, and one to mark the ending point. The two empty elements identify themselves as to which is the start and which is the end, and co-identify themselves by an sID attribute (the start of the traditional element) and an eID attribute (the end of the traditional element), the values of which must match.

Empty elements are indicated in XML by a tag with "/" preceding the final ">": thus <q/> rather than <q> or </q>. Elements used in this way are commonly called ‘milestones,’ and those particular elements in OSIS that permit this alternate encoding are thus called ‘milestoneable.’

Here is a short example...

<verse osisID="Acts.7.2" sID="a72"/>To this he replied:
<speech who=”Stephan”>Brothers and fathers, listen to me! The God of glory appeared
to our father Abraham while he was still in Mesopotamia, before he lived in Haran
<verse eID=”a72”/>

...

<verse osisID="Acts.7.6" sID="a76"/>God spoke to him in this way: <q
type=”embedded” marker=”'”>Your descendants will be strangers in a country not
their own, and they will be enslaved and mistreated four hundred years. <verse
eID="a76"/>
<verse osisID="Acts.7.7" sID="a77"/>But I will punish the nation they serve as
slaves,</q> God said, <q type=”embedded” marker=”'”>and afterward they will come out
of that country and worship me in this place.</q><verse eID="a77"/>

...

<verse osisID="Acts.7.53" sID="a79"/>you who have received the law that was put
into effect through angels but have not obeyed it.
<verse eID="a79"/>
</speech>
+1  A: 

There's nothing illegal about this markup, at least as far as XML syntax is concerned.

It's a clever solution to the problem of having a string of text that must be broken up into segments in two overlapping schemes. You omitted the enclosing tags so it's impossible to intuit the hierarchical structure, but I will assume it's there and makes some attempt to organize things in a logical manner as a narrative. Then there's the need to indicate where the verse breaks are, and in general they can be totally arbitrary. They are really point-events in the flow (their term: milestones).

The only thing I'd disagree with is having 'start' and 'end' markers for verses. This introduces a potential for errors since the bracketing of start and end can't be validated within XML itself. I'd have used only 'start' markers. This assumes, of course, that the end of every verse corresponds to the start of another, or the end of a hierarchical section. I.e. it's not possible to have something 'between' two verses.

Jim Garrison
While I agree it is legal markup, it makes it difficult/impossible to validate in a standard XML editor.
pdavis
What specific constraints would you like to validate? It might be possible to express them in a way that can be implemented in XPath or XSL.
Jim Garrison