tags:

views:

405

answers:

8

I'm wondering what the best practices are for storing a relational data structure in XML. Particulary, I am wondering about best practices for enforcing node order. For example, say I have three objects: School, Course, and Student, which are defined as follows:

class School
{
    List<Course> Courses;
    List<Student> Students;
}

class Course
{
    string Number;
    string Description;
}

class Student
{
    string Name;
    List<Course> EnrolledIn;
}

I would store such a data structure in XML like so:

<School>
    <Courses>
     <Course Number="ENGL 101" Description="English I" />
     <Course Number="CHEM 102" Description="General Inorganic Chemistry" />
     <Course Number="MATH 103" Description="Trigonometry" />
    </Courses>
    <Students>
     <Student Name="Jack">
      <EnrolledIn>
       <Course Number="CHEM 102" />
       <Course Number="MATH 103" />
      </EnrolledIn>
     </Student>
     <Student Name="Jill">
      <EnrolledIn>
       <Course Number="ENGL 101" />
       <Course Number="MATH 103" />
      </EnrolledIn>
     </Student>
    </Students>
</School>

With the XML ordered this way, I can parse Courses first. Then, when I parse Students, I can look up each Course listed in EnrolledIn (by its Number) in the School.Courses list. This will give me an object reference to add to the EnrolledIn list in Student. If Students, however, comes before Courses, such a lookup to get a object reference is not possible. (Since School.Courses has not yet been populated.)

So what are the best practices for storing relational data in XML? - Should I enforce that Courses must always come before Students? - Should I tolerate any ordering and create a stub Course object whenever I encounter one I have not yet seen? (To be expanded when the definition of the Course is eventually reached later.) - Is there some other way I should be persisting/loading my objects to/from XML? (I am currently implementing Save and Load methods on all my business objects and doing all this manually using System.Xml.XmlDocument and its associated classes.)

I am used to working with relational data out of SQL, but this is my first experience trying to store a non-trivial relational data structure in XML. Any advice you can provide as to how I should proceed would be greatly appreciated.

A: 

From experience, XML isn't the best to store relational data. Have you investigated YAML? Do you have the option?

If you don't, a safe way would be to have a strict DTD for the XML and enforce that way. You could also, as you suggest, keep a hash of objects created. That way if a Student creates a Course you keep that Course around for future updating when the tag is hit.

Also remember you can use XPath queries to access specific nodes directly, so you can enforce parsing of courses first regardless of position in the XML document. (making a more complete answer, thanks to dacracot)

Instantsoup
I have heard of YAML, but not used it. Does it have built in support in .NET?
Joseph Sturtevant
There is a sourceforge .NET implemenation of YAML: http://yaml-net-parser.sourceforge.net/default.html
Instantsoup
+2  A: 

While you can specify order of child elements using a <xsd:sequence>, by requiring child objects to come in specific order you make your system less flexible (i.e., harder to update using notepad).

Best thing to do is to parse out all your data, then perform what actions you need to do. Don't act during the parse.


Obviously, the design of the XML and the data behind it precludes serializing a single POCO to XML. You need to control the serialization and deserialization logic in order to unhook and re-hook objects together.

I'd suggest creating a custom serializer that builds the xml representation of this object graph. It can thereby control not only the order of serialization, but also handle situations where nodes aren't in the expected order. You could do other things such as adding custom attributes to use for linking objects together which don't exist as public properties on the objects being serialized.

Creating the xml would be as simple as iterating over your objects a few times, building up collections of XElements with the expected representation of the objects as xml. When you're done you can stitch them together into an XDocument and grab the xml from it. You can make multiple passes over the xml on the reverse side to re-create your object graph and restore all references.

Will
Good idea. How is that usually implemented? Do you create an entire intermediate data structure to hold the values of your parse before converting them to business objects?
Joseph Sturtevant
Using a DOM parse rather than a SAX parse you will have the entire structure in memory and can then perform random access queries against it using XPath.
dacracot
Oh yes you can enforce order of the children in XML using an XML Schema (XSD) and specifying a sequence.
Andrei Rinea
Thanks, investigated and updated. I only created my first XSD this past month...
Will
A: 

The order is not usually important in XML. In this case the Courses could come after Students. You parse the XML and then you make your queries on the entire data.

tpower
A: 

XML is definitely not a friendly place for relational data.

If you absolutely need to do this, then I'd recommend a funky inverted kind of logic.

In your example, you've got Schools, which offers many courses, taken by many students.

Your XML might follow as such:

<School>
    <Students>
        <Student Name="Jack">
                <EnrolledIn>
                        <Course Number="CHEM 102" Description="General Inorganic Chemistry" />
                        <Course Number="MATH 103" Description="Trigonometry" />
                </EnrolledIn>
        </Student>
        <Student Name="Jill">
                <EnrolledIn>
                        <Course Number="ENGL 101" Description="English I" />
                        <Course Number="MATH 103" Description="Trigonometry" />
                </EnrolledIn>
        </Student>
    </Students>
</School>

This obviously isn't the least repetitive way to do this (it's relational data!), but it's easily parse-able.

Pete Karl II
See below where you can do a foreign key lookup using XPath within XML.
dacracot
+2  A: 

Don't think in SQL or relational when working with XML, because there are no order constraints.

You can however query using XPath to any portion of the XML document at any time. You want the courses first, then "//Courses/Course". You want the students enrollments next, then "//Students/Student/EnrolledIn/Course".

The bottom line being... just because XML is stored in a file, don't get caught thinking all your accesses are serial.


I posted a separate question, "Can XPath do a foreign key lookup across two subtrees of an XML?", in order to clarify my position. The solution shows how you can use XPath to make relational queries against XML data.

dacracot
A: 

Node ordering is only important if you need to do forward-only processing of the data, e.g. using an XmlReader or a SAX parser. If you're going to read the XML into a DOM before processing it (which you are if you're using XmlDocument), node order doesn't really matter. What matters more is that the XML be structured so that you can query it with XPath efficiently, i.e. without having to use "//".

If you take a look at the schema that the DataSetGenerator produces, you'll see that there's no ordering associated with the DataTable-level elements. It may be that ADO processes elements in some sequence not represented in the schema (e.g. one DataTable at a time), or it may be that ADO does forward-only processing and doesn't enforce relational constraints until the DataSet is fully read. I don't know. But it's clear that ADO doesn't couple the processing order to the document order.

(And yes, you can specify the order of child elements in an XML schema; that's what xs:sequence does. If you don't want node order to be enforced, you use an unbounded xs:choice.)

Robert Rossney
A: 

You could also use two XML files, one for courses and a second for students. Open and parse the first before you do the second.

Jim C
A: 

I's been a while, but I seem to remember doing a base collection of 'things' in one part of an xml file, and referring to them in another using the schema features keyref and refer. I found a few examples here. My apologies if this is not what you're looking for.

Nerdfest