views:

140

answers:

3

Hello!

I want to store some fragments of an XML file in separate files. It seems, there is no way to do it in a straight way: Reading the chunks fails.

I always get the Exception "javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed."

It only works when there is only ONE 'root' element (which is not the root element in the normal sense).

I understand that XML with multiple 'roots' is not well-formed, but it should be treated as a chunk.

Please, before suggesting some work-around-solutions, tell me: Are XML chunks valid at all?

And IF so, can they be read out using standard JDK6 API?

Test code:

String testChunk1 = "<e1>text</e1>";
String testChunk2 = "<e1>text</e1><e2>text</e2>";

// the following doesn't work with 'testChunk2'
StringReader sr = new StringReader(testChunk1);
StringWriter sw = new StringWriter();

TransformerFactory.newInstance().newTransformer().transform(
    new StreamSource(sr), new StreamResult(sw));

System.out.println(sw.toString());
+1  A: 

While I suppose there must be some way, perhaps kludgy, to do what you want, I am not aware of any way to do it. The standard XML parsers expect well-formed XML, as you're discovering.

If you want to store your XML as a number of separate fragments in different files, then probably the best way to do this is to create your own Reader or InputStream that actually (behind the scenes) reads all of the fragments in order, and then provide that wrapped Reader or InputStream to the transformer. That way, the XML parser sees a single XML document but you can store it however you want.

If you do something like this, the fragments (except for the very first) cannot start with the standard XML header:

<?xml version="1.0" encoding="UTF-8" ?>
Eddie
Actually he could create some InputStream that reads XML files with and without XML headers and combines them into a single XML file that does have a header.
Bombe
I was thinking about that, but this is what I would call awork-around-solution ;)I think I will use this as last option.
ivan_ivanovich_ivanoff
+2  A: 

The W3C have been working towards defining a standard for XML fragment interchange. I'm mentioning it not because it's a solution to your problem, but it's definitely relevant to see that there's discussion of how to handle such things.

In the .NET world you can work with XML fragments and, for example, validate them against a schema. This suggests that it is worth searching for similar support in the Java libraries.

If you want to transform such fragments with XSLT, a very common approach is to put a wrapper element around them, which can then act as the root of the DOM.

Dominic Cronin
The 'current' version of the XML fragment interchange is from 2001 and still a "candidate recommendation" (since 8 years!). So, I think there is no further work to expect from W3C on it. I could not find other standard proposals by the W3C on similar topic. Do you know any? Thanks.
ivan_ivanovich_ivanoff
+1  A: 

Please, before suggesting some work-around-solutions, tell me: Are XML chunks valid at all?

Not in their own right.

You can include them (served as XML external parsed entities) in other documents through methods such as an entity reference, and you can parse them as chunks into existing documents using methods such as DOM Level 3 LS's parseWithContext() (which Java doesn't give you, sorry), but they aren't documents so any interfaces that require a full document cannot accept them.

Transformer requires a full document as input because XSLT works on full documents, and would be confused by something that contained zero or more-than-one root element. The usual trick is to create a single root element by wrapping the document in start and end tags, but this does mean you can't have an XML declaration(*), as mentioned by Eddie.

(*: actually it's known as the ‘Text Declaration’ when included in an external parsed entity, but the syntax is exactly the same.)

bobince