tags:

views:

473

answers:

3

I'm looking for a generic algorithm which can flatten a XML file into a table, given multiple XPath expressions and all things i've tried failed due to the nature of available XPath engines implementations.

Given a XML:

<A Name="NameA">
<B Name="NameB1">
 <C Name="NameC1"/>
 <C Name="NameC2"/>
 <C Name="NameC3"/>
</B>
<B Name="NameB2">
 <C Name="NameC4"/>
 <C Name="NameC5"/>
 <C Name="NameC6"/>
</B>

and the following XPath expressions as input:

/A/@Name
/A/B/@Name
/A/B/C/@Name

The output should be a table in the following form:

NameA NameB1 NameC1

NameA NameB1 NameC2

NameA NameB1 NameC3

NameA NameB2 NameC4

NameA NameB2 NameC5

NameA NameB2 NameC6

I'm trying to get to this table with available Java XML packages such as javax.xml.xpath, jdom, etc.. to no avail.

It seems like the

XPath.evaluate("/A/B/C/@Name", doc, XPathConstants.NODESET);

code will return a "detached" Node which cannot be traversed.

I've tried many ways of recursion on XPath evaluated Nodes to no avail. Also thought of DFS traversal of the DOM tree, but again all XPath evaluators seem to return detached Nodes where node.getParent() will always return 'null'.

Any ideas for a "multi-XPath expression aware" algorithm which can keep track of nested XPath expressions?

I have a feeling this is possible easily with XSLT but my XSLT skills are pretty rusty...

A: 

EDIT Same thing but with XPath:

        XPathFactory f = XPathFactory.newInstance();
        XPath xPath = f.newXPath();
        NodeList list = (NodeList) xPath.evaluate("//*[* and not(*/*)]/*", new InputSource(stream), XPathConstants.NODESET);

        for (int i = 0; i < list.getLength(); i++) {
            Node n = list.item(i);
            Stack<Node> s = new Stack<Node>();

            while (n != null) {
                s.push(n);
                n = n.getParentNode();
            }

            s.pop(); //this is document root, we don't need it

            while (s.size() > 0) {
                NamedNodeMap map = s.pop().getAttributes();

                for (int j = 0; j < map.getLength(); j++) {
                    Node node = map.item(j);
                    System.out.print(node.getNodeName() + ": " + node.getTextContent() + " ");
                }
            }

            System.out.println("");
        } 

You can use regular DOM functions. It is not as nice as XPath, but generic and will work with any XML file.

If I understand you right, then this code will do the trick:

    String xml = "<A Name=\"NameA\">\n" +
            "<B Name=\"NameB1\">\n" +
            "        <C Name=\"NameC1\"> </C>\n" +
            "        <C Name=\"NameC2\"/>\n" +
            "        <C Name=\"NameC3\"/>\n" +
            "</B>\n" +
            "<B Name=\"NameB2\">\n" +
            "        <C Name=\"NameC4\"/>\n" +
            "        <C Name=\"NameC5\"/>\n" +
            "        <C Name=\"NameC6\"/>\n" +
            "</B></A>";
    try {
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = builder.parse(new ByteArrayInputStream(xml.getBytes()));

        Queue<Node> q = new LinkedList<Node>();

        q.add(doc.getFirstChild());
        //start BFS
        while (q.size() > 0) {
            Node n = q.poll();
            NodeList childNodes = n.getChildNodes();
            //add all children of current node
            int elemNodes = 0;
            for (int i = 0; i < childNodes.getLength(); i++) {
                Node node = childNodes.item(i);
                if (node.getNodeType() == Node.ELEMENT_NODE) {
                    elemNodes++;
                    q.add(node);
                }
            }
            //if node has no children, print its path
            if (elemNodes == 0) {
                Stack<Node> s = new Stack<Node>();

                while (n != null) {
                    s.push(n);
                    n = n.getParentNode();
                }

                s.pop(); //this is document root, we don't need it

                while (s.size() > 0)
                    System.out.print(s.pop().getAttributes().getNamedItem("Name").getTextContent() + " ");

                System.out.println("");
            }
        }
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
tulskiy
Great response Piligrim! That's a nice approach to simple traverse all the DOM tree :)But my problem is that the XML i'd be dealing with, might have expressions like:/A/@Name/A/B/@AnotherName/A/B/C/D/E/@ADifferentNameso the "Name" constant won't work :(
YarinB
Name is not a constant, you can get any attribute of the node.
tulskiy
ok, I've changed the code to handle any attribute.
tulskiy
A: 

I would expect you could do it with XSLT2. (If you are limited to XSLT1 then I am not sure). See http://www.xml.com/pub/a/2003/11/05/tr.html for a tutorial. You can have multiple group-by instructions and they all take XPaths. I can't immediately give you code for your problem but if you read the tutorial I think it maps quite well.

peter.murray.rust
+2  A: 

This XSLT:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output indent="yes" />

    <xsl:template match="/">
    <table>
<!--Based upon your comments, it sounds as if you don't know what the structure of the XML you will be dealing with is(element nesting or attribute names).
     That makes it a little bit difficult.    
     Based upon the example XML you gave the following for-each will work:-->
     <xsl:for-each select="//C"> <!--You could also use "/A/B/C" -->
     <tr>
<!--This looks up the node tree and creates a column for the current element, as well as for each of it's parents, using the first Attribute as the value.-->
      <xsl:for-each select="ancestor-or-self::*">
      <td><xsl:value-of select="@*[1]"/></td>
      </xsl:for-each>
     </tr>
     </xsl:for-each>
    </table>
    </xsl:template>

</xsl:stylesheet>

works for the XML provided and produces the following:

<?xml version="1.0" encoding="UTF-16"?>
<table>
<tr>
<td>NameA</td>
<td>NameB1</td>
<td>NameC1</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB1</td>
<td>NameC2</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB1</td>
<td>NameC3</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB2</td>
<td>NameC4</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB2</td>
<td>NameC5</td>
</tr>
<tr>
<td>NameA</td>
<td>NameB2</td>
<td>NameC6</td>
</tr>
</table>
Mads Hansen