views:

208

answers:

2

I think the solution to my problem is very easy, but i couldn't fint it So, here is:

I have an XML which have a list of elements with different names, but in sequence. An example:

<DOC>
 <DOC_OBL_1>
  <TIP_DOC_OBL>1</TIP_DOC_OBL> 
 </DOC_OBL_1>
 <DOC_OBL_2>
  <TIP_DOC_OBL>2</TIP_DOC_OBL> 
 </DOC_OBL_2>
 <DOC_OBL_3>
  <TIP_DOC_OBL>3</TIP_DOC_OBL>  
 </DOC_OBL_3>
</DOC>

So, i have 3 elements: *DOC_OBL_1, DOC_OBL_2 and DOC_OBL_3*. And yes, there could be number 4, 5, 6, etc. As you can se, all 3 have the same elements inside(actually, they have a lot of them, but arent important righ now), and I thinked i could declare a general type which could validate this kind of documents.

How can i validate this with an Schema???

I know its a very ugly XML (maybe it isnt standard, please tell me, i dont know), but It's not my concern to build this document. I just have to parse it, validate it and transform it.

+1  A: 

Well, sure you can! Pretty simple actually: if the structure is the same for each element, you can define a single <xs:complexType> to validate that, and then use:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="DOC" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
  <xs:element name="DOC">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="DOC_OBL_1" type="DocType" />
        <xs:element name="DOC_OBL_2" type="DocType" />
        <xs:element name="DOC_OBL_3" type="DocType" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:complexType name="DocType">
    <xs:sequence>
      <xs:element name="TIP_DOC_OBL" type="xs:string" minOccurs="0" />
    </xs:sequence>
  </xs:complexType>
</xs:schema>

Does that work for you? Does it handle all your needs?

As Zach points out quite correctly - this "solution" obviously is rather limited, since it can't deal with an arbitrary number of tag DOC_OBL_1, DOC_OBL_2, ...., DOC_OBL_x - the name and thus the number of tags must be known ahead of time.

This is unfortunate, but it's the only solution, given this crippled XML. The REAL solution would be to have something like:

<DOC>
  <DOC_OBL id="1">
  </DOC_OBL>
  <DOC_OBL id="2">
  </DOC_OBL>
  .....
  <DOC_OBL id="x">
  </DOC_OBL>
</DOC>

and then the XML schema would become even easier and could deal with any number of <DOC_OBL> tags.

But the GIGO principle applies: Garbage In, Garbage Out ==> crappy XML structure comes in, only a crappy, incomplete validation is possible.

Marc

marc_s
@marc_s if there could be any number of the DOC\_OBL\_1,DOC\_OBL\_2, DOC\_OBL\_N nodes, is there a way to validate against the schema doing some sort of regex against the element name?
Zach Bonham
No, that's unfortunately not possible, I'm afraid. You can only have the actual tag name as an element name, or then you'd have to have a fixed tag name and the variable part (sequential number) in an attribute of that tag.
marc_s
Thank you. I think Im going to do this with at least 50 elements (DOC_OBL_1 ... DOC_OBL_50).Ugly input -> ugly solution.. :DBut it sould works, anyway.
eLZahR
+2  A: 

Its unfortunate that the xml element names have basically sequence numbers/identifiers in them. I would say that's poorly defined (non standard) XML.

In my limited (!) experience, this means that the xsd schema would have to have a all the possible "DOC_OBL_N" elements defined in the sequence. This is probably not practical if there is no theoretical upper limit to their number.

As long as its valid xml, you could load it up and count all the children of the element DOC and then write the schema on the fly, but that sounds like its self defeating.

That may leave you with manually validating the xml instance using some xpaths - kind of a brute force approach and not technically validating against an xsd schema.

Zach Bonham
The input is a huge document, so more code than a simple Schmea isn't viable.
eLZahR