views:

150

answers:

9

Let's say you have to write an xml-based (no choice) language that will be some kind of "standard" format in the end, used by billions of applications around the world, or at least you hope for it. That language will be like html for internet, but in another specific domain. Something really simple and descriptive, that will be interpreted by tools and other applications.

Now let's say you have a basic understanding of how XML works (you know how tags works, that they might have attributes and that there can be elements in elements...). You really understand the domain well but you never did write a language or xml-based format specification before (other than some basic xml formats for your company internal tools).

What else do you have to know to do your job right? Maybe some XML language specific features? Maybe using a XSD file as a specification file?

To sum up : What are the best practices when designing and writing specification for this kind of language?

+1  A: 

Definitely, you'll want to learn XPath at one point or another. It's (I think) the best way to select XML.

Avindra Goolcharan
+3  A: 

Firstly, you need to know your problem domain really, really well to make sure your markup can cover all the requirements for those billions of applications. Everything else is secondary. It's not a technology or tools issue.

Trevor Tippins
Ok, I should have added that you really really understand your domain but you never did write a language or xml-based format specification before :DWill add in the question.
Klaim
If you're going for something that is meant for very broad adoption then, unless you know you're a 110% certified genius, you're probably best to make it open for peer review sooner rather than later to make sure all those odd corner cases don't get omitted.
Trevor Tippins
Yes think about it as an open "standard" for open and wide usage.
Klaim
Cool. I'd be interested to take a look when you publish it.
Trevor Tippins
You'll be able to. I'll be waiting for feedback and I think there will be several versions before the language is clean enough but that will be really interesting I guess :)
Klaim
Trevor, if you're still interested, I'm looking for people to help review my xml-based language. Our (open-source project) website is not up yet but if you can help it would be great. By the way, the language is about digital sequence (to make, for example media-rich webcomics).Please contact me if you're interested! :)
Klaim
+3  A: 

The blog post Using and Abusing XML have some good advice, among other things:

Another popular misuse of XML involves thin-wrapping arbitrary data with XML tags ... such as the following:

<key>Name</key><string>Audiobooks</string>
<key>Playlist ID</key><integer>94</integer>

In a better, tailor-designed XML file format, we’d expect this pair to be something like

<name id="94">Audiobooks</name>
hlovdal
+1  A: 

Definitely use a schema, whether it's an XSD or RELAX NG.

Jacob
Use XSD - there's better tool support.
John Saunders
+1  A: 

IBM did a series on Principles of XML Design which holds many truths. The best advice is that there's never 1 single right way other then:

  • Be concise in your design choices, if you choose route A choose it everywhere. i.e: if you use a wrapper element <books> to hold <book>'s use a wrapper element everywhere for collections.

  • Be as terse as possible to avoid clutter. XML is suppose to be readable by us humans.

  • Avoid namespaces as much as possible
  • It HAS to be validatable through a schema.
Martijn Laarman
That's interesting. I'm not aware of the history of XML so may I ask if those docs are still right today?
Klaim
I have to disagree with avoiding namespaces. If you're developing a standard, especially one that will be used along with other XML documents, then you need namespaces. OTOH, you may only need one namespace for the entire XSD.
John Saunders
That one made my eyebrows going up. I'll read it to understand their point.
Klaim
@John Saunders you're right I didn't mean avoid them all together but no one likes working with a namespace tagsoup :) @Klaim those docs may be old but are still true today.
Martijn Laarman
+2  A: 

First off, only do something yourself if there really isn't anything else already in existence which could be used instead.

Keep element names short but/and descriptive.

If at all possible, have a very strict schema which doesn't allow for multiple ways of doing the same thing. This will prevent possible confusion over what is possible or how to interpret the markup.

Be very careful about allowing extensibility as this may allow the problems a strict schema tries to prevent.

Make sure you version your schema and always try to avoid breaking changes but/and allow backwards compatibility with new versions.

Ensure you have a validator and other tools available to make use of your new language as easy as possible.

Matt Lacey
+1  A: 

first of all, i agree with trevor, you have to know the area you're covering, nothing worse than a patched up standard, that looks it.

second, you will need to know at least a little bit about xsd and xslt. and slightly more about xpath/xquery, since users of your standard will likely use these to handle their content.

third, i sugggest you dig as deep as you can into other XML based standards, to see how they were constructed. the XHTML standard is very good for study, since it is the oldest XML standard, and it's evolution was driven by actual usage over an extended period of time. also, you may want to consider studying atom and rss, xsd (this time as a standrad, not a technology), and microformats

Nir Gavish
+2  A: 
  1. Learn XML Schema
    • Do not try to make your schema convenient by allowing elements in different orders.
    • Make your schema accessible over the Internet. You don't need to host it at a URL that's related to your namespace, but that can be nice.
  2. Learn XML Namespaces
  3. Learn XPATH
  4. Understand what an XML INFOSET is, and learn what it means to serialize one.
John Saunders
+1  A: 
  • Namespaces: What they are, when and when not to use them, how they impact parsing
  • Schema Validation/XSD. One of the advantages of XML is that it's easily verifiable, so I expect a Schema to validate against for everything that calls itself a standard
  • XPath and other querying mechanisms (XQuery is rare and related to XPath, but still a standard of it's own to at least quickly look at)
  • General Knowledge about escaping stuff, CDATA or other ways
  • When to use attributes vs. when to use child elements
  • Possible related standards. This is not strictly related, but for example if you need to add Document Signing, there are already standards for that (e.g., XML Signature). Basically every time you add a function, have a quick look if there is already a standard and decide if it's worth adapting it instead. Reinventing the Wheel is okay if you're at least aware why all the other wheels suck.
Michael Stum