ansaurus

Question

Answer 1

+2 A:

Your problem translates to this: you have an XSM file that doesn't match the schema and you want to transform it to something that's valid.

With XSOM, you can read the structure in the XSD and perhaps analyze the XML but it still would need additional mapping from the invalid form to the valid form. The use of a stylesheet would be much easier, because you would walk through the XML, using XPath nodes to handle the elements in the proper order. With an XML where you want apples before pears, the stylesheet would first copy the apple node (/Fruit/Apple) before it copies the pear node. That way, no matter of the order in the old file, they would be in the correct order in the new file.

What you could do with XSOM is to read the XSD and generate the stylesheet that will re-order the data. Then transform the XML using that stylesheet. once XSOM has generated a stylesheet for the XSD, you can just re-use the stylesheet until the XSD is modified or another XSD is needed.

Of course, you could use XSOM to copy nodes immediately in the right order. But since this means your code has to walk itself through all nodes and child nodes, it might take some time to process to finish. A stylesheet would do the same, but the transformer will be able to process it all faster. It can work directly on the data while the Java code would have to get/set every node through the XMLDocument properties.

So, I would use XSOM to generate a stylesheet for the XSD which would just copy the XML node by node to re-use over and over again. The stylesheet would only need to be rewritten when the XSD changes and it would perform faster than when the Java API needs to walk through the nodes itself. The stylesheet doesn't care about order so it would always end up in the right order.

To make it more interesting, you could just skip XSOM and try to work with a stylesheet that reads the XSD to generate another stylesheet from it. This generated stylesheet would be copying the XML nodes in the exact order as defined in the stylesheet. Would it be complex? Actually, the stylesheet would need to generate templates for every element and make sure the child elements in this element are processed in the correct order.

When I think about this, I wonder if this has been done before already. It would be very generic and would be able to handle almost every XSD/XML.

Let's see... Using "//xsd:element/@name" you would get all element names in the schema. Every unique name would need to be translated to a template. Within these templates, you would need to process the child nodes of the specific element, which is slightly more complex to get. Elements can have a reference, which you would need to follow. Otherwise, get all child xsd:element nodes it.

Workshop Alex 2009-09-16 21:10:58

Yep, that's the way to go.

JG 2009-09-16 21:18:06

OK, cool, we're both on the same page now :) I agree that a XSL transform would re-arrange my document more efficiently than manually poking around in the DOM, but the initial problem of using the XSOM API to find out what the order *should* be remains, regardless of the mechanism I use to perform the re-ordering itself.

skaffman 2009-09-16 22:01:30

I suddenly wonder if it isn't possible to use a stylesheet to transform an XSD into an XML-copying stylesheet. Would make an interesting cross-platform solution. If you're already familiar with XSD's and XSLT's then this might be easier than having to learn more about XSOM.

Workshop Alex 2009-09-16 22:06:08

I dunno, schemas can be fearsomely complex, especially the ones I'm working with.... extended types, substitution groups, all that stuff. Scary.

skaffman 2009-09-16 22:12:07

Keep an eye open for this Q: http://stackoverflow.com/questions/1437443/ ;-)

Workshop Alex 2009-09-17 08:22:17

Answer 2

+3 A:

I don't have a good answer to this yet, but I have to note that there is potential for ambiguity there. Consider this schema:

<xs:element name="root">
  <xs:choice>
    <xs:sequence>
      <xs:element name="foo"/>
      <xs:element name="bar">
        <xs:element name="dee">
        <xs:element name="dum">
      </xs:element>
    </xs:sequence>
    <xs:sequence>
      <xs:element name="bar">
        <xs:element name="dum">
        <xs:element name="dee">
      </xs:element>
      <xs:element name="foo"/>
    </xs:sequence>
  </xs:choice>
</xs:element>

and this input XML:

<root>
  <foo/>
  <bar>
    <dum/>
    <dee/>
  </bar>
</root>

This could be made to comply with the schema either by reordering <foo> and <bar>, or by reordering <dee> and <dum>. There doesn't seem to be any reason to prefer one over another.

Pavel Minaev 2009-09-16 22:16:58

Well spotted, that's a fair point. In my case, however, I know that such an ambiguity wouldn't arise, since every `<bar>` would have the same schema type, with the same child ordering.

skaffman 2009-09-16 22:20:48

Good point (+1), but how common would such constructions be? And why would someone use such a construction?

Workshop Alex 2009-09-17 08:24:47

Answer 3

+1 A:

Basically you want to take the root element and from there recursively look at the children in the document and the children defined in the schema and make the order match.

I'll give you a C#-syntax solution, since that's what I code in day and night, it's pretty close to Java. Note that I'll have to take guesses about XSOM since I don't know it's API. I've also made up the XML Dom methods since giving your C# ones propbably wouldn't help :)

// assume first call is SortChildrenIntoNewDocument( sourceDom.DocumentElement, targetDom.DocumentElement, schema.RootElement )

public void SortChildrenIntoNewDocument( XmlElement source, XmlElement target, SchemaElement schemaElement )
{
    // whatever method you use to ask the XSOM to tell you the correct contents
    SchemaElement[] orderedChildren = schemaElement.GetChildren();
    for( int i = 0; i < orderedChildren.Length; i++ )
    {
        XmlElement sourceChild = source.SelectChildByName( orderedChildren[ i ].Name );
        XmlElement targetChild = target.AddChild( sourceChild )
        // recursive-call
        SortChildrenIntoNewDocument( sourceChild, targetChild, orderedChildren[ i ] );
    }
}

I wouldn't recommend a recursive method if it's going to be a deep tree, in that case you would have to create some 'tree walker' type objects. The advantage of that approach is you'll be able to handle more complex things like when the schema says you can have 0-or-more of an element you can keep processing source nodes until there's no more that match, then move the schema walker on from there.

Timothy Walters 2009-09-16 22:21:54

It's not as simple as that, because there's no `getChildren()` - as I understand, there could be things like `xs:choice`, or `maxOccurs > 1`, so there may not even be a single specific element as Nth child - it would be "X or Y or ...", essentially arbitrary long.

Pavel Minaev 2009-09-16 22:26:08

I figured it might be like that (given the nature of XSD), so the 2nd option I mentioned is the only way to go really. If no-one else comes up with a solution I can post an example of how it would work.

Timothy Walters 2009-09-17 20:40:04

ansaurus

tags:

views:

answers:

Using a schema to sort an XML document

related questions