views:

24

answers:

1

I have a code line like this :

StringWriter writer = new StringWriter(); 
JAXBContext jc = JAXBContext.newInstance(namespace);
Marshaller marshaller = jc.createMarshaller();        
marshaller.marshal(input, writer);

When namespace = "nfpa:nfpares". I have a generated content like this :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ServiceRequest xmlns:ns2="nfpares" xmlns="nfpa">
...
</ServiceRequest>

but somehow, in another part of the library (which I don't have control), the developer also use JAXBContext and the same namespace, yet the content generated is:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ServiceRequest xmlns="nfpa" xmlns:ns2="nfpares">
...
</ServiceRequest>

Notice the namespace is still the same, but their ordering has been switched. I need to do an encrypted validation on the raw content. Everything is the same between these two, except for the order of namespace. Does anyone know what has happened? Because I use a different version of JAXBContext?

thanks

+2  A: 

When doing cryptographic signature checks on xml you need to work with xml in caninical form. There are lots of different serialization possibilities for the same xml structure that are the same to an xml parser. For example <a/> vs <a></a>, &#160; vs &#xA0; or the order of attributes and namespaces. There is a standard for XML canonicalization which describes the following transformation:

The canonical form of an XML document is physical representation of the document produced by the method >described in this specification. The changes are summarized in the following list:

  • The document is encoded in UTF-8
  • Line breaks normalized to #xA on input, before parsing
  • Attribute values are normalized, as if by a validating processor
  • Character and parsed entity references are replaced
  • CDATA sections are replaced with their character content
  • The XML declaration and document type declaration (DTD) are removed
  • Empty elements are converted to start-end tag pairs
  • Whitespace outside of the document element and within start and end tags is normalized
  • All whitespace in character content is retained (excluding characters removed during line feed normalization)
  • Attribute value delimiters are set to quotation marks (double quotes)
  • Special characters in attribute values and character content are replaced by character references
  • Superfluous namespace declarations are removed from each element
  • Default attributes are added to each element
  • Lexicographic order is imposed on the namespace declarations and attributes of each element

An implementation of this method can be found in the apache xml security project in the class Canonicalizer

Jörn Horstmann
There's another Canonicalizer implementation in the excellent XOM package (http://www.xom.nu/apidocs/nu/xom/canonical/Canonicalizer.html)
skaffman