views:

4921

answers:

9

Hi All

I have two XML files of similar structure which I wish to merge into one file. Currently I am using EL4J XML Merge which I came across in this tutorial. However it does not merge as I expect it to for instances the main problem is its not merging the from both files into one element aka one that contains 1, 2, 3 and 4. Instead it just discards either 1 and 2 or 3 and 4 depending on which file is merged first.

So I would be grateful to anyone who has experience with XML Merge if they could tell me what I might be doing wrong or alternatively does anyone know of a good XML API for Java that would be capable of merging the files as I require?

Many Thanks for Your Help in Advance

Edit:

Could really do with some good suggestions on doing this so added a bounty. I've tried jdigital's suggestion but still having issues with XML merge.

Below is a sample of the type of structure of XML files that I am trying to merge.

<run xmloutputversion="1.02">
    <info type="a" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="up" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="1">
                <state value="test" />
                <service value="gamma" />
            </result>
            <result id="2">
                <state value="test4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

<run xmloutputversion="1.02">
    <info type="b" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="down" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="3">
                <state value="testagain" />
                <service value="gamma2" />
            </result>
            <result id="4">
                <state value="testagain4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>

Expected output

<run xmloutputversion="1.02">
    <info type="a" />
    <debugging level="0" />
    <host starttime="1237144741" endtime="1237144751">
        <status state="down" reason="somereason"/>
        <status state="up" reason="somereason"/>
        <something avalue="test" test="alpha" />
        <target>
            <system name="computer" />
        </target>
        <results>
            <result id="1">
                <state value="test" />
                <service value="gamma" />
            </result>
            <result id="2">
                <state value="test4" />
                <service value="gamma4" />
            </result>
         <result id="3">
                <state value="testagain" />
                <service value="gamma2" />
            </result>
            <result id="4">
                <state value="testagain4" />
                <service value="gamma4" />
            </result>
        </results>
        <times something="0" />
    </host>
    <runstats>
        <finished time="1237144751" timestr="Sun Mar 15 19:19:11 2009"/>
        <result total="0" />
    </runstats>
</run>
A: 

You might be able to write a java app that deserilizes the XML documents into objects, then "merge" the individual objects programmatically into a collection. You can then serialize the collection object back out to an XML file with everything "merged."

The JAXB API has some tools that can convert an XML document/schema into java classes. The "xjc" tool might be able to do this, although I can't remember if you can create classes directly from the XML doc, or if you have to generate a schema first. There are tools out there than can generate a schema from an XML doc.

Hope this helps... not sure if this is what you were looking for.

Andy White
Thanks for you answer its not really what I had in mind but will keep as an option if no one comes up with another solution.
Mark Davidson
A: 

I took a look at the referenced link; it's odd that XMLMerge would not work as expected. Your example seems straightforward. Did you read the section entitled Using XPath declarations with XmlMerge? Using the example, try to set up an XPath for results and set it to merge. If I'm reading the doc correctly, it would look something like this:

XPath.resultsNode=results
action.resultsNode=MERGE
jdigital
I have tried this but its still not working right unfortunately, I will have a look around see if I can find some better documentation for it.
Mark Davidson
+1  A: 

It might help if you were explicit about the result that you're interested in achieving. Is this what you're asking for?

Doc A:
<root>
  <a/>
  <b>
    <c/>
  </b>
</root>

Doc B:
<root>
  <d/>
</root>

Merged Result:
<root>
  <a/>
  <b>
    <c/>
  </b>
  <d/>
</root>

Are you worried about scaling for large documents?

The easiest way to implement this in Java is to use a streaming XML parser (google for 'java StAX'). If you use the javax.xml.stream library you'll find that the XMLEventWriter has a convenient method XMLEventWriter#add(XMLEvent). All you have to do is loop over the top level elements in each document and add them to your writer using this method to generate your merged result. The only funky part is implementing the reader logic that only considers (only calls 'add') on the top level nodes.

I recently implemented this method if you need hints.

tyler
A: 

Have you considered just not bothering with parsing the XML "properly" and just treating the files as big long strings and using boring old things such as hash maps and regular expressions...? This could be one of those cases where the fancy acronyms with X in them just make the job fiddlier than it needs to be.

Obviously this does depend a bit on how much data you actually need to parse out while doing the merge. But by the sound of things, the answer to that is not much.

Neil Coffey
can you garantee that the straight string will regenerate proper XML ? How much validation and testing are you willing to put on that solution vs the "troubble" of using the X tool that will take that in charge ?
Newtopian
If the example files given are representative, and the requirement is as stated, then I think, yes, I can. If there's some hidden part to the problem (files in different formats, a lot of validation required), then the most practical may be to parse "properly".
Neil Coffey
A: 

In addition to using Stax (which does make sense), it'd probably be easier with StaxMate (http://staxmate.codehaus.org/Tutorial). Just create 2 SMInputCursors, and child cursor if need be. And then typical merge sort with 2 cursors. Similar to traversing DOM documents in recursive-descent manner.

StaxMan
+2  A: 

Not very elegant, but you could do this with the DOM parser and XPath:

public class MergeXmlDemo {

  public static void main(String[] args) throws Exception {
    // proper error/exception handling omitted for brevity
    File file1 = new File("merge1.xml");
    File file2 = new File("merge2.xml");
    Document doc = merge("/run/host/results", file1, file2);
    print(doc);
  }

  private static Document merge(String expression,
      File... files) throws Exception {
    XPathFactory xPathFactory = XPathFactory.newInstance();
    XPath xpath = xPathFactory.newXPath();
    XPathExpression compiledExpression = xpath
        .compile(expression);
    return merge(compiledExpression, files);
  }

  private static Document merge(XPathExpression expression,
      File... files) throws Exception {
    DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
        .newInstance();
    docBuilderFactory
        .setIgnoringElementContentWhitespace(true);
    DocumentBuilder docBuilder = docBuilderFactory
        .newDocumentBuilder();
    Document base = docBuilder.parse(files[0]);

    Node results = (Node) expression.evaluate(base,
        XPathConstants.NODE);
    if (results == null) {
      throw new IOException(files[0]
          + ": expression does not evaluate to node");
    }

    for (int i = 1; i < files.length; i++) {
      Document merge = docBuilder.parse(files[i]);
      Node nextResults = (Node) expression.evaluate(merge,
          XPathConstants.NODE);
      while (nextResults.hasChildNodes()) {
        Node kid = nextResults.getFirstChild();
        nextResults.removeChild(kid);
        kid = base.importNode(kid, true);
        results.appendChild(kid);
      }
    }

    return base;
  }

  private static void print(Document doc) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory
        .newInstance();
    Transformer transformer = transformerFactory
        .newTransformer();
    DOMSource source = new DOMSource(doc);
    Result result = new StreamResult(System.out);
    transformer.transform(source, result);
  }

}

This assumes that you can hold at least two of the documents in RAM simultaneously.

McDowell
This looks promosing although would be better to be more dynmaic.Do you have any good resources for reading more about DOM parser and XPath.
Mark Davidson
There's a pretty good tutorial on devWorks: http://www.ibm.com/developerworks/library/x-javaxpathapi.html
McDowell
A: 

So, you're only interested in merging the 'results' elements? Everything else is ignored? The fact that input0 has an <info type="a"/> and input1 has an <info type="b"/> and the expected result has an <info type="a"/> seems to suggest this.

If you're not worried about scaling and you want to solve this problem quickly then I would suggest writing a problem-specific bit of code that uses a simple library like JDOM to consider the inputs and write the output result.

Attempting to write a generic tool that was 'smart' enough to handle all of the possible merge cases would be pretty time consuming - you'd have to expose a configuration capability to define merge rules. If you know exactly what your data is going to look like and you know exactly how the merge needs to be executed then I would imagine your algorithm would walk each XML input and write to a single XML output.

tyler
Its a bit difficult to make clear using two XML files I might need to post up a few examples its just important that some groups such as nodes and target will either merge or add new elements appropriately. But other stuff like run stats can be left as a single group.
Mark Davidson
A: 

You can try Dom4J which provides a very good means to extract information using XPath Queries and also allows you to write XML very easily. You just need to play around with the API for a while to do your job

Ram
+1  A: 

Thanks to everyone for their suggestions unfortunately none of the methods suggested turned out to be suitable in the end, as I needed to have rules for the way in which different nodes of the structure where mereged.

So what I did was take the DTD relating to the XML files I was merging and from that create a number of classes reflecting the structure. From this I used XStream to unserialize the XML file back into classes.

This way I annotated my classes making it a process of using a combination of the rules assigned with annotations and some reflection in order to merge the Objects as opposed to merging the actual XML structure.

If anyone is interested in the code which in this case merges Nmap XML files please see http://fluxnetworks.co.uk/NmapXMLMerge.tar.gz the codes not perfect and I will admit not massively flexible but it definitely works. I'm planning to reimplement the system with it parsing the DTD automatically when I have some free time.

Mark Davidson