tags:

views:

45

answers:

1

Given the following xml document (assuming more books then actually listed) and using the java implementation of xpath. What expression would I use to find a unique set of author names?

<inventory>
    <book year="2000">
        <title>Snow Crash</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553380958</isbn>
        <price>14.95</price>
    </book>

    <book year="2005">
        <title>Burning Tower</title>
        <author>Larry Niven</author>
        <author>Jerry Pournelle</author>
        <publisher>Pocket</publisher>
        <isbn>0743416910</isbn>
        <price>5.99</price>
    </book>

    <book year="1995">
        <title>Zodiac</title>
        <author>Neal Stephenson</author>
        <publisher>Spectra</publisher>
        <isbn>0553573862</isbn>
        <price>7.50</price>
    </book>

    <!-- more books... -->

</inventory>
+4  A: 
Set<String> uniqueAuthors = new HashSet<String>();
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
XPathExpression expr = xpath.compile("//book/author/text()");
NodeList nodes = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); ++i) {
    uniqueAuthors.add(nodes.item(i).getNodeValue());
}

I used the excellent article "The Java XPath API" as a reference.

XPath Version 1.0 cannot, in general, select distinct values, so I inserted the authors into a Set. At the end of the for loop, the set will contain all of the authors.

Daniel Trebbien
+1 But I wonder why the sample data in the OP is exactly the same as in the linked article.
stacker
That would be because I took the sample data from the same article. The solution given here (which isn't in the article) is a good solution but not quite what I was looking for, I was hoping to determine if there was an xpath query itself which returns a unique set, not whether I could extract a unique set from a query.
J E Bailey
@J E Bailey: XPath does not have a way to select a set of unique values. However, XQuery does: http://www.xqueryfunctions.com/xq/fn_distinct-values.html
Daniel Trebbien
@Daniel Trebbian: So, in following up with the xquery lead it appears that the distinct-values function is a part of xpath 2.0 specification.There's a nice summary of the differences here -http://msdn.microsoft.com/en-us/magazine/cc188789.aspx.Where it also depicts a xpath 1.0 unique select function with which the dictinct-values was supposed to be replace. In the case of the sample xml I had it would be //author[not(preceding::author = .)] which answers my question :)
J E Bailey
@J E Bailey: Interesting. I wasn't aware of the new `distinct-values` function in XPath 2.0, but note that the Java XPath API is still XPath Version 1.0, so you may not be able to use it yet. Also, I saw tricks that used `preceding-sibling::` as a way of selecting distinct values, but it seems to me that to use this trick in this case, all authors would need to appear in the document *alphabetically*. Therefore, I don't know if the `preceding-sibling::` location path trick is truly equivalent.
Daniel Trebbien
@Daniel Trebbian: As a follow up for completeness. I checked up on the preceding-sibling:: function, I wouldn't classify it as a trick, what it is effectively doing is looking at the set of all prior nodes and checking all the nodes to see if there is a match against the current node. So being in alphabetical order isn't an issue. However, from the reading I've done, and based on how the implementation is done, the length of time to process the list would grow exponentially with the list size. So in the end you're solution is probably the most effective.
J E Bailey