views:

228

answers:

6

I need to translate from XMI to OWL (XML/RDF serialized) in Java, so essentially this is XML to XML translation and most probably I could just play with regex and use replaceAll to what I need, but that seems very messy way to do it. What would you suggest so that it will be easily customizable later (my OWL model might change slightly in the future)?

My idea was to read XMI into created class hierarchy (according to my OWL model) and then using some template engine to output it as OWL (XML). Do you know of easier way that would be easily customizable?

+4  A: 

XSL Transformations is perfect for this kind of job, in fact it designed for it :-)

To start with XSLT, have a look at the zvon reference and its tutorial.

rsp
+2  A: 

You could use XSLT to transform XML to XML.

This OReilly article is a good place to start.

Robert Christie
A: 

I agree with rsp and cb160 that XSLT is the tool for the job.

If you're on a unix platform you could consider xsltproc to test the transformations on the command line. In my experience that can really speed up development time if you're not really at home with XSL.

extraneon
A: 

Hi

The other respondents are right, use XSLT. I'll just add: Don't try to use regular expressions. I think I'm correct in saying that XML is not regular and while some regular expression implementations have non-regular features you'll always be fighting to bend regular expressions to do your will.

Regards

Mark

High Performance Mark
+1  A: 

XMI is not a very good format for direct transformation into OWL - there are many different structures in XMI which have the same meaning ( @stereotype="foo", stereotype/@name="foo", and stereotype/@xmi:id="{id of the foo stereotype}" all mean the same thing ) - I strongly advise using a two-stage process where the XMI is first transformed into a canonical form where such references are resolved and any information you don't want to map into OWL is removed.

The XSLT key function and element will prove useful if you're not familiar it. Although you can do it in XSLT1 (and I did when there was no other available), working in an XSLT2 processor such as Saxon makes the transform much more concise. The best place to ask XSLT questions is the Mulberry list.

There was a tool on sourceforge which did this through a GUI, but I can't seem to find it. My intermediate transforms are owned by a previous employer. For code generation or XMI to XML, I use XSLT directly and the two-stage approach.

Pete Kirkham
A: 

XSLT is designed for processing trees of XML nodes. While there are RDF serializations which are a "tree" of XML nodes (RDF/XML and RDF/XML-Abbrev), the underlying RDF data model is a graph.

If your resulting RDF graph is not also tree, you're going to have to do dirty things in your XSLT to traverse references and performance/maintainability/sanity can suffer. Just be aware of this if you modify the OWL format and then want to convert back to non-RDF XML.

A simple (tree) example is as follows:

## Foo has two types
@prefix e: <uri://example#>.
e:Foo a e:Bar.
e:Foo a e:Baz. # Second statement about e:Foo

For conversions back to non-RDF XML, if you use the most basic RDF/XML form you will get a list of RDF statements immediately under the top level rdf:RDF element. Transforming these can involve searching the entire list of statements over and over.

<rdf:RDF xmlns:e="uri://example#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"&gt;
  <rdf:Description rdf:about="uri://example#Foo">
    <rdf:type rdf:resource="uri://example#Baz"/>
  </rdf:Description>
  <rdf:Description rdf:about="uri://example#Foo">
    <rdf:type rdf:resource="uri://example#Bar"/>
  </rdf:Description>
</rdf:RDF>

You might find the RDF/XML-Abbrev format easier to read, but it is not easy to process with XSLT because RDF's data model is unordered and one graph can have many equivalent (but incompatible to your XSLT) XML forms. The example above can serialize as either of the following:

<!-- Bar is the containing element -->
<rdf:RDF xmlns:e="uri://example#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"&gt;
  <e:Bar rdf:about="uri://example#Foo">
    <rdf:type rdf:resource="uri://example#Baz"/>
  </e:Bar>
</rdf:RDF>

<!-- Baz is the containing element -->
<rdf:RDF xmlns:e="uri://example#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"&gt;
  <e:Baz rdf:about="uri://example#Foo">
    <rdf:type rdf:resource="uri://example#Bar"/>
  </e:Bar>
</rdf:RDF>

Pete Kirkham's suggestion of creating a canonical form for serialization will aide you in writing XSLTs. In most cases, given the exact same input, a RDF library will serialize the statements to the same format every time, but I would not depend on this in the long run as data in a RDF graph is unordered.

Phil M