tags:

views:

86

answers:

5

Given an XML document such as this:

<root>
    <foo>
        <bar>a</bar>
        <bar>b</bar>
        <bar>c</bar>
    </foo>
    ...
</root>

How can I retrieve all foo-nodes that have bar-subnodes with certain values?

So for instance, if I need all foo-elements that have bar-subelements with values a and c, I am currently using this expression:

//*/foo[bar/text()='a'][bar/text()='c']

which is fine, except that it gets clumsy if I have more "bar-constraints" and I'm not too big of a fan of programmatically generated XPath expressions :). What I am looking for is something along these lines (obviously invalid syntax):

//*/foo[bar/text() in-set('a', 'c')]

Any ideas?

+1  A: 

You can use

//*/foo[bar/text()='a' or bar/text()='c']

There is no 'in' operator in XPath. Here's a list of xpath operators.

Rashmi Pandit
Right, but that doesn't really help me. I would still have to programmatically create that sequence which, given enough constraints, can be pretty long.Instead I'd rather just create the string `'a','b','c'` and not touch the XPath expression itself.
n3rd
A: 

If you have access to XPath 2.0 you can use the matches function:

//foo[fn:matches(bar, "a|c")]

Unfortunately XPath 2.0 isn't that widely supported.

Tomalak's answer gave me an idea:

//foo[bar[fn:matches(text(),"a|c")]]

That might work, since it's not working on a node set but an individual text node.

Welbog
I just tried your suggestion, but it seems that `matches()` does not accept a list as its first parameter.
n3rd
Damn. Figures. The bottom line is XPath isn't good at this sort of thing. There is no simple way to do what you want without using multiple `[]` clauses or `or` conditions.
Welbog
The second expression will match all `foo` nodes which have any `bar` node with either `"a"` or `"c"`. What's needed is to only match `foo` nodes which have _both_ `bar` with `"a"`, and another `bar` with `"c"`.
Pavel Minaev
A: 

For "AND" expressions you are more or less stuck with what you currently have. Though I'd write it as:

//*/foo[bar/text()='a' and bar/text()='c']
For "OR" expressions (your question is not clear on this) you can try with:
//*/foo[bar[contains(',a,c,', concat(',', text(), ','))]]
Or, more readably written:
//*/foo[
  bar[
    contains(
      ',a,c,', 
      concat(',', text(), ',')
    )
  ]
]

This finds <foo> elements that contain <bar> elements which themselves conform to a certain rule. And the rule is: "text value (enclosed in commas) is contained in the search string (enclosed in commas)".

For a text value of 'a' and a search string of 'a,b' you would look for ',a,' within ',a,b,'. So if any of the inner <bar> elements matches, the outer <foo> element is going to be selected.

You'd have to chose a delimiter that cannot be contained in the values, obviously. From your sample code, I chose ',' but any valid character will do.

Tomalak
Wouldn't this require the `bar`-elements to be in a specific order as well as me knowing that order ahead of time?
n3rd
No. See my expanded answer.
Tomalak
+1  A: 

It is not entirely clear if you want AND or OR there. Your XPath example with two filters is an AND (i.e. require that foo has both "a" and "c"), but comments to other replies seem to imply that you actually want an OR (any foo with either "a" or "c"). With XPath 2.0, the latter would be very easy:

//foo[bar[. = ('a', 'c')]]

AND is a bit trickier:

//foo[count(distinct-values(data(bar[. = ('a', 'c')])))) = 2]

or, if you would use variables (I'll show XQuery syntax, but in practice you should use your XPath implementation API to provide values for external variables):

let $values := ('a', 'c')
return //foo[count(distinct-values(data(bar[. = $values])))) = count($values)]
Pavel Minaev
A: 

You can be sneaky about it and add an element to your document like:

<values>
   <value>a</value>
   <value>c</value>
</values>

Then this XPath

/root/foo[count(bar[.=/root/values/value]) = count(/root/values/value)]

will find all foo elements where the number of bar children whose value is in your list of values equals the number of values in that list. This will work - if all of the bar elements contain distinct values.

Robert Rossney