tags:

views:

23

answers:

1

I have been trying to parse one(actually - many) xsd files to write out the list of element-names, respective element-type and documentation.

I looked into XSOM, SAXParser, Xerces, JAXP - all of which make it easy to read an xml and read nodes. Reading an xsd without equating to element names (to get a list of all element names) seems difficult. The parser.parse works fine with most of the libs I tried (as an XSD is a well formed xml), but I am not able to get beyond that (to extract all element names).

Am I missing anything? Anyone has any experience with a similar problem?

following is a sample xsd:

<?xml version="1.0" encoding="utf-8" ?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://abc.mycompany.com/dto/address" targetNamespace="http://abc.mycompany.com/sdo/address"&gt;
  <xs:complexType name="Address">
    <xs:sequence> 
      <xs:element name="address1" minOccurs="0">
        <xs:annotation>
          <xs:documentation>USPS standardized address: building number, street name, apartment/suite number, and directionals (e.g., NE, SE, NW, SW).</xs:documentation>
        </xs:annotation>
        <xs:simpleType>
          <xs:restriction base="xs:normalizedString">
            <xs:maxLength value="100" />
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element name="address2" minOccurs="0">
        <xs:annotation>
          <xs:documentation>Additional field for wrapping long addresses.</xs:documentation>
        </xs:annotation>
        <xs:simpleType>
          <xs:restriction base="xs:normalizedString">
            <xs:maxLength value="100" />
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element name="city" minOccurs="0">
        <xs:annotation>
          <xs:documentation>Name of the city, town or village.</xs:documentation>
        </xs:annotation>
        <xs:simpleType>
          <xs:restriction base="xs:normalizedString">
            <xs:maxLength value="26" />
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
      <xs:element name="state" type="xs:normalizedString" minOccurs="0" >
        <xs:annotation>
          <xs:documentation>A pick list of two-letter abbreviations representing US states,
                              military post offices, US protectorates, and Canadian provinces.
      </xs:documentation>
        </xs:annotation>
      </xs:element>
      <xs:element name="zipCode" type="xs:normalizedString" minOccurs="0" >
        <xs:annotation>
          <xs:documentation>The first 5 digits of a 9-digit (Zip+4) zip code,
                    used to geographically locate a US address.</xs:documentation>
        </xs:annotation>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
</xs:schema>
+1  A: 

It's gotta be a parser that's configured to deal with the namespace.

You could also use XSL-T to match on "xs:element" and extract the names that way.

duffymo
+1 Most definitely, use XSLT. A simple stylesheet can solve this problem quickly without a lot of extra work. However, the OP's question is a bit under-specified -- for example, there's a fair bit of complexity in specifying how you want to display the "respective element type", as well as how elements nest inside complex types.
Jim Garrison