ansaurus

Question

Answer 1

+4 A:

Set<String> uniqueAuthors = new HashSet<String>();
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
XPathExpression expr = xpath.compile("//book/author/text()");
NodeList nodes = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); ++i) {
    uniqueAuthors.add(nodes.item(i).getNodeValue());
}

I used the excellent article "The Java XPath API" as a reference.

XPath Version 1.0 cannot, in general, select distinct values, so I inserted the authors into a Set. At the end of the for loop, the set will contain all of the authors.

Daniel Trebbien 2010-06-03 17:48:24

+1 But I wonder why the sample data in the OP is exactly the same as in the linked article.

stacker 2010-06-03 17:58:12

That would be because I took the sample data from the same article. The solution given here (which isn't in the article) is a good solution but not quite what I was looking for, I was hoping to determine if there was an xpath query itself which returns a unique set, not whether I could extract a unique set from a query.

J E Bailey 2010-06-03 18:15:44

@J E Bailey: XPath does not have a way to select a set of unique values. However, XQuery does: http://www.xqueryfunctions.com/xq/fn_distinct-values.html

Daniel Trebbien 2010-06-03 18:25:02

@Daniel Trebbian: So, in following up with the xquery lead it appears that the distinct-values function is a part of xpath 2.0 specification.There's a nice summary of the differences here -http://msdn.microsoft.com/en-us/magazine/cc188789.aspx.Where it also depicts a xpath 1.0 unique select function with which the dictinct-values was supposed to be replace. In the case of the sample xml I had it would be //author[not(preceding::author = .)] which answers my question :)

J E Bailey 2010-06-03 19:27:33

@J E Bailey: Interesting. I wasn't aware of the new `distinct-values` function in XPath 2.0, but note that the Java XPath API is still XPath Version 1.0, so you may not be able to use it yet. Also, I saw tricks that used `preceding-sibling::` as a way of selecting distinct values, but it seems to me that to use this trick in this case, all authors would need to appear in the document *alphabetically*. Therefore, I don't know if the `preceding-sibling::` location path trick is truly equivalent.

Daniel Trebbien 2010-06-03 20:14:41

@Daniel Trebbian: As a follow up for completeness. I checked up on the preceding-sibling:: function, I wouldn't classify it as a trick, what it is effectively doing is looking at the set of all prior nodes and checking all the nodes to see if there is a match against the current node. So being in alphabetical order isn't an issue. However, from the reading I've done, and based on how the implementation is done, the length of time to process the list would grow exponentially with the list size. So in the end you're solution is probably the most effective.

J E Bailey 2010-06-04 13:21:31

ansaurus

tags:

views:

answers:

obtaining a unique set with javas xpath

related questions