views:

429

answers:

3

To set the scene - I work in one of those industries that loves estimating and tracking pretty much everything. One of our key metrics is SLOC (source lines of code - declarative and executable statements). We use it for project size and cost estimation, project planning, and many other things. We try to use it to compare apples to apples (ie, we don't compare SLOC in one language/domain to SLOC in another language/domain). NOTE: We don't evaluate individual developers on this metric, nor do we call something wrong or bad just because the SLOC is different than expected. We do, however, consider a project has more SLOC likely to also have more bugs.

Fairly recently, I've started working in projects that use libraries in place of components that would otherwise have been hand-coded - for example JSF instead of JSP, Hibernate instead of JDBC, etc. So... instead of writing lines of code, our team is developing XML files. The XML mappings still take effort, and there's still a vague correlation of complexity - that having 100X more of these XML configuration files in a given project might suggest that it took more effort to create and might be more complex to debug than a project with only 1/100th of the XML files.

So... does anyone have any suggestions for measuring the size of these XML configuration files? # of elements? # elements + # attributes? something else?

A: 

Let me start by saying that evaluating a project based on criteria like this is just as goofy as evaluating a programmer based on the same thing. I know that there are studies that show that there is a clear correlation between number of lines of code and nuber of code defects. In my opinion that is simply a matter of increased scale.

Having said that, if your overlords...err...I mean management requires you to come up with something here are some relatively easy measurements to make:

  • Total number of nodes
  • Number of node types
  • Average attributes per node
  • Greatest level of nesting
EBGreen
It's not much that we're *required* as that I don't want the effort of developing XML configuration to go overlooked. Theoretically, the XML files are saving us some degree of complexity, but they still take work.
bethlakshmi
A: 

Well, if you're just mapping schema to structure in terms of your basic SOAP/WSDL operations, messages and types then you can probably just equate each one of these aspects to it's respective method, message, and class.

For example...a customer schema as such:

  <?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:tns="http://tempuri.org" 
           targetNamespace="http://tempuri.org"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;
  <xs:element name="Customer">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="FirstName" type="xs:string" />
        <xs:element name="LastName" type="tns:LastNameType" />
      </xs:sequence>
      <xs:attribute name="CustID" type="xs:positiveInteger"
                    use="required" />
    </xs:complexType>
  </xs:element>
  <xs:simpleType name="LastNameType">
    <xs:restriction base="xs:string">
      <xs:maxLength value="20"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

...would equate to a Customer class as such...

public class Customer
{
    public string FirstName{}
    public string LastName{get;set;}
    etc...
}

This way you can continue to use your current benchmarks for the SLOC just on a relative scale.

The problem with this is that writing XML schema does not really allow for the big variation in LOC that writing a Java or C# program does. A programmer can write a C# class in a million different ways where as a schema definition is far more structured and only really allows for a variation in the length of operation, message and variable names. Therefore, if you are just writing XML now instead of Java or C# then you may want to consider that your SLOC metric is going to hold a lot less water then it used to in terms of determining project size and bugs.

matt_dev
To be clear - my concern is not the complexity of custom developed schemas. This is reuse of schemas defined by open source products. The complexity of the schema vs. the complexity of how we are using it are different.
bethlakshmi
+3  A: 

Interesting question. The only metric I am aware of (other than just counting nodes and attributes as you suggest) is something called the Structured Document Complexity Metric.

http://www.oreillynet.com/xml/blog/2006/05/metrics_for_xml_projects_5_str_1.html

Is the best link I can find on it currently (its been a while). I also found this little tool which will apparently calculate it for you (there may be others):

http://schematron.com/resources/documentcomplexitymetric.html

Beyond that, I'm afraid my only advice would be to just pick a couple of metrics to track that seem reasonable, and re-evaluate them to determine if they actually do trend with the effort being applied to each document...

Brian B.