tags:

views:

162

answers:

4

Suppose I have xml like this:

<Products>
  <Product name="Liquid Oxygen">
    <Manufacturer name="Universal Exports" country="UK" />
  </Product>
  <Product name="Hydrazine">
    <Manufacturer name="ACME Inc." country="USA" />
  </Product>
  <Product name="Gaseous Oxygen" obsolete="true">
    <Manufacturer name="Universal Exports" country="UK" />
  </Product>
  <Product name="Liquid Nitrogen">
    <Manufacturer name="Penguins R Us" country="Antarctica" />
  </Product>
</Products>

And I want to pick out the Product nodes that have a Manufacturer subnode with @country of UK, but that do not have an @obsolete of true. I can say either

/Products/Product[Manufacturer/@country = 'UK' and not(@obsolete = 'true)]

or

/Products/Product[Manufacturer/@country = 'UK'][not(@obsolete = 'true')]

and both get me the nodes I want.

My question is, is there any functional difference between these two approaches to and-ed conditions? Is there a situation in which the different approaches could give different results? (I realise that and serves a purpose within more complex conditions) Stylistically, is one to be preferred over the other?

(I'm using C# and .NET 2.0, but I don't believe that will have any bearing on the answer)

+1  A: 

While there may be a slight difference in how they are processed internally, there is no difference between your two statements. Likewise there is no performance benefit or recommendation on whether to one over the other.

I personally prefer the first (and) as it's easier to tell what is going on, particularly when you add more complex comparisons (ands and ors) to the mix.

Richard Szalay
+5  A: 

For and in your situation, they are giving you the same results. As for the style, choose the one which expresses your intention better.

However, as soon as you start using positional functions such as position(), count(), first() and last(), it does make a difference.

See for yourself:

/Products/Product[Manufacturer/@country = 'UK' and (position() = 1)]

does not return any node if the first product in the list is not from the UK, while:

/Products/Product[Manufacturer/@country = 'UK'][position() = 1]

will return the first node matching UK as country, and may also be written in short form:

/Products/Product[Manufacturer/@country = 'UK'][1]
Lucero
+3  A: 

The difference between them is:

  • the first one (… and …) is a single predicate, that must be fulfilled as a whole
  • the second one (…][…) is multiple predicates, that are evaluated one by one

As long as you use a conjunction (logical "and") to tie conditions together, the outcome is the same. A disjunction (logical "or") is not achievable with the latter.

Also note that multiple predicates are processed in the order they are defined, which means that

/Products/Product[Manufacturer/@country = 'UK'][2]

and

/Products/Product[Manufacturer/@country = 'UK' and position() = 2]

are two different things:

  • the second product of those being manufactured in the UK vs.
  • the second product of the list, but only when it is manufactured in the UK
Tomalak
+1  A: 

The XmlVisualizer (for Windows/.NET only) tool is handy for visualizing xpath results, or for hacking around with xpath:

alt text

Free & open source.

Cheeso