tags:

views:

60

answers:

5

This query is returning values less than 1000. It should only be returning values between 1000 and 1100. Why is that?

//results/Building[ 1 = 1 and (( Vacancy/sqft > 1000 ) and ( Vacancy/sqft < 1100 ) ) ]

The query will return the following building, which has vacancies less than 1000 square feet and greater than 1100 square feet:

<Building>
  <Vacancy><sqft>900</sqft></Vacancy>
  <Vacancy><sqft>1000</sqft></Vacancy>
  <Vacancy><sqft>2000</sqft></Vacancy>
  <Vacancy><sqft>500</sqft></Vacancy>
</Building>

Why is it included in the results?

Sample data:

<results>
  <Building><!--Should this be selected?--></Building>

  <Building><!--Should be selected-->
    <Vacancy><sqft>1050</sqft></Vacancy>
  </Building>

  <Building><!--Should be selected-->
    <Vacancy><sqft>1025</sqft></Vacancy>
    <Vacancy><sqft>1075</sqft></Vacancy>
  </Building>

  <Building><!--Shouldn't be selected-->
    <Vacancy><sqft>10</sqft></Vacancy>
    <Vacancy><sqft>50</sqft></Vacancy>
  </Building>

  <Building><!--Should this be selected?-->
    <Vacancy><sqft>1050</sqft></Vacancy>
    <Vacancy><sqft>2000</sqft></Vacancy>
  </Building>

  <Building><!--Should this be selected?-->
    <Vacancy><sqft>900</sqft></Vacancy>
    <Vacancy><sqft>1040</sqft></Vacancy>
  </Building>

  <Building><!--Shouldn't be selected-->
    <Vacancy><sqft>10500</sqft></Vacancy>
  </Building>

  <Building><!--Shouldn't be selected-->
    <Vacancy><sqft>900</sqft></Vacancy>
    <Vacancy><sqft>1000</sqft></Vacancy>
    <Vacancy><sqft>2000</sqft></Vacancy>
    <Vacancy><sqft>500</sqft></Vacancy>
  </Building>

</results>

Thanks.

+5  A: 

The sample Building has a Vacancy child with sqft of 2000, so Vacancy/sqft > 1000 succeeds. It has a child with sqft of 1000 (and 900 and 500), so Vacancy/sqft < 1100 succeeds. Thus the xpath selects the Building.

The comparison expressions (such as Vacancy/sqft <= 1000) are implicitly qualified with "there exists"–as in "there exists a Vacancy child that has a sqft child with value > 1000"–because Vacancy/sqft is a set of nodes, rather than a single node. Moreover, each comparison has its own qualification, so the sqft in Vacancy/sqft > 1000 doesn't need to be the same sqft as in Vacancy/sqft < 1100. Note that //results/Buildings is a node set; the predicate [...] applies separately to each item in the set, which is why there isn't an issue with qualifiers. Translating your original xpath into English, we get:

Select the buildings (in the results) such that 1=1 and there exists a vacancy square footage > 1000 and there exists a vacancy square footage < 1100.

Let's take the English statement of the desired query and make it a little closer to , arriving at one of:

Select the buildings (in the results) such that there exists a vacancy with square footage such that it's > 1000 and it's < 1100

Select the buildings (in the results) such that there exists a vacancy such that the square footage > 1000 and the square footage < 1100

The former leads to jasso's solution, the latter to:

//results/Building[ Vacancy[1000 < sqft and sqft < 1100] ]

Original solution

Try the logical double-negation of the condition:

//results/Building[ Vacancy and not (Vacancy/sqft <= 1000 or Vacancy/sqft >= 1100) ]

This predicate includes a test for Vacancy children to filter out cases that are otherwise trivially true, i.e. buildings with no vacancies. The English equivalent of this solution is:

Select buildings (in the results) such that the building has a vacancy and it's not the case that there exists a vacancy square footage <= 1000 or there exists a vacancy square footage >= 1100

In fewer words:

Select all buildings with vacancies where no vacancy has <= 1000 square feet or >= 1100 square feet.

In fewer words still:

Select all buildings with vacancies where all vacancies are between 1000 and 1100 square feet.

outis
Yes I see your logic but that's not the result I was expecting.
Simone Maynard
What outis is saying is that all of the conditions in your predicate are met, so you will get that `Building` every time. You need to look at it from the context of the `Building`. You're not testing all of the `Vacancy/sqft` values at once. It's easier to think of the XPath you wrote like this: Does 1 = 1? Yes. Does `Building` have a `Vacancy/sqft` greater than 1000? Yes. Does `Building` have a `Vacancy/sqft` less than 1100? Yes.
DevNull
Ok, thanks. I see what you are saying.
Simone Maynard
A: 

Try changing your xpath to this:

//results/Building[number(Vacancy/sqft) > 1000  and  number(Vacancy/sqft) < 1100 ]

I suspect it's treating your Vacancy/sqft node like text which could be causing some weirdness...

I removed your 1=1 and extra parens because I didn't see a need for them. The main point is to try the number function.

UPDATE

This one is a little odd but it grabs the ones you want plus the one you aren't sure if you want (Should this be selected?):

//results/Building[count(Vacancy[sqft > 1000  and  sqft < 1100 ]) = count(Vacancy)]

and if you want to exclude that one:

//results/Building[(count(Vacancy[sqft > 1000  and  sqft < 1100 ]) = count(Vacancy)) and count(Vacancy) > 0]

Also I am using this site to text my xpaths, if there is some sort of fundamental difference between how they do it and how objective-c does let me know...

Abe Miessler
Good catch, but "If one object to be compared is a node-set and the other is a number, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the number to be compared and on the result of converting the string-value of that node to a number using the number function is true." (XPath 1.0, §3.4).
outis
Have you tested it? I got the expected results when I ran it...
Abe Miessler
Also, that was one of the most complicated sentences I've ever read. Is it just me?
Abe Miessler
This fails with this data: `<Building><Vacancy><sqft>1050</sqft></Vacancy><Vacancy><sqft>1101</sqft></Vacancy></Building>` (it will return this `Building` even though there is a `Vacancy/sqft` greater than 1100.
DevNull
There is also one between 1000 and 1100. I was under the impression that she wanted to return building nodes that had sqft in that range...
Abe Miessler
Yes, thanks, I tested it, but that doesn't make a difference. It already compares it as a number....
Simone Maynard
@Abe: Welcome to the world of standards documents. Sadly, they get much, much worse. Your solution does indeed work with my first test data, but try it against the updates. The problem is that number() is applied to an entire node set, which first converts the set to a single string.
outis
@outis: There's nothing wrong with the standards document. `number()` works exactly as described. The problem is that people try to use functions and operators without understanding the distinction between those that operate on nodesets, and those that force their argument(s) to a single value. Thus @Abe's first XPath tests the first `Vacancy/sqft` in each building, but ignores the others.
LarsH
@Abe, re: your last couple of XPath expressions -- `count(A[test]) = count(A)` is saying "all `A` satisfy `test`", i.e. "there are no `A` that don't satisfy `test`". That can be written `not(A[not(test)])`. This will likely be more efficient since it doesn't have to count the number of A's.
LarsH
@LarsH: I didn't say the standards document was in error, merely that standards document language is complex. For the rest, we're saying the same thing: it's a problem of application, rather than specification. `number()`, when applied to a node set, won't accomplish what Abe wanted it to accomplish.
outis
@outis, I agree that standards documents are sometimes difficult to understand, but then their role is not that of a tutorial for the unacquainted. Their role is that of a precise specification for implementors, trainers, and those that must know the language inside and out. I didn't learn XPath primarily from the standards doc, and I wouldn't expect Abe to either.
LarsH
@Larsh: definitely. They're also helpful when something doesn't work the way we expect it, which means something is wrong with our conceptual model. A standards doc tells us how things should behave, so that we can adjust and correct said conceptual model.
outis
+3  A: 
//results
  /Building[1 = 1 and 
            (( Vacancy/sqft > 1000 ) and (Vacancy/sqft < 1100 ))]

This query is returning values less than 1000. It should only be returning values between 1000 and 1100. Why is that?

From http://www.w3.org/TR/xpath/#booleans

If one object to be compared is a node-set and the other is a number, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the number to be compared and on the result of converting the string-value of that node to a number using the number function is true.

Node set comparisons are existencial comparisons. Vacancy/sqft > 1000 means: Is there at least one Vacancy/sqft greater than 1000?

If you want to select Building elements having Vacancy/sqft grand children, and all of them in the range (1000,1100), this XPath expression:

/results/Building[Vacancy/sqft and not(Vacancy/sqft[1000 >= . or . >= 1100])]
Alejandro
+1 for the only correct answer
Dimitre Novatchev
This expression doesn't match "All buildings with a vacancy between 1000 and 1100 square feet". But maybe the OP added that definition after you posted your answer. +1 for explanation of key concept, existential comparisons.
LarsH
+1 Excellent explanation
Garett
@LarsH: all the answers were given before the English statement of the query was specified, so there's that. We all headed off in different directions before the goal was set.
outis
@outis: HIWTH (hate it when that happens)
LarsH
+3  A: 

Do you also need to match buildings with some sqft outside your criteria but at least one sqft between 1000-1100 like this

  <Building>Should this be selected too?
    <Vacancy><sqft>1000</sqft></Vacancy>
    <Vacancy><sqft>1050</sqft></Vacancy>
    <Vacancy><sqft>2000</sqft></Vacancy>
  </Building>

If yes, then use XPath expression

/results/Building[Vacancy/sqft[. > 1000 and 1100 > . ] or not(Vacancy)]

It also selects buildings with no <Vacancy> element (as requested).

jasso
+1 matches the English problem statement.
LarsH
+1  A: 

Here are two XPath expressions:

1. The following selects all nodes that you believe should be selected:

/*/*[Vacancy and not(Vacancy[. < 1000 or . > 1100])]
  1. The following selects all nodes that you believe should be selected and all those you are not certain about. It doesn't select any node that you are certain should not be selected:

/*/*[not(Vacancy) or Vacancy[. > 1000 and not(. > 1100)]]

This XSLT transformation can be used to verifu the correctness of the XPath expressions:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "/*/*[Vacancy and not(Vacancy[. &lt; 1000 or . > 1100])]
  "/>

===============================
  <xsl:copy-of select=
  "/*/*[not(Vacancy) or Vacancy[. > 1000 and not(. > 1100)]]
  "/>

 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<results>
  <Building><!--Should this be selected?--></Building>

  <Building><!--Should be selected-->
    <Vacancy><sqft>1050</sqft></Vacancy>
  </Building>

  <Building><!--Should be selected-->
    <Vacancy><sqft>1025</sqft></Vacancy>
    <Vacancy><sqft>1075</sqft></Vacancy>
  </Building>

  <Building><!--Shouldn't be selected-->
    <Vacancy><sqft>10</sqft></Vacancy>
    <Vacancy><sqft>50</sqft></Vacancy>
  </Building>

  <Building><!--Should this be selected?-->
    <Vacancy><sqft>1050</sqft></Vacancy>
    <Vacancy><sqft>2000</sqft></Vacancy>
  </Building>

  <Building><!--Should this be selected?-->
    <Vacancy><sqft>900</sqft></Vacancy>
    <Vacancy><sqft>1040</sqft></Vacancy>
  </Building>

  <Building><!--Shouldn't be selected-->
    <Vacancy><sqft>10500</sqft></Vacancy>
  </Building>

  <Building><!--Shouldn't be selected-->
    <Vacancy><sqft>900</sqft></Vacancy>
    <Vacancy><sqft>1000</sqft></Vacancy>
    <Vacancy><sqft>2000</sqft></Vacancy>
    <Vacancy><sqft>500</sqft></Vacancy>
  </Building>

</results>

the wanted, correct results are produced:

<Building><!--Should be selected-->
   <Vacancy>
      <sqft>1050</sqft>
   </Vacancy>
</Building>
<Building><!--Should be selected-->
   <Vacancy>
      <sqft>1025</sqft>
   </Vacancy>
   <Vacancy>
      <sqft>1075</sqft>
   </Vacancy>
</Building>

===============================
  <Building><!--Should this be selected?--></Building>
<Building><!--Should be selected-->
   <Vacancy>
      <sqft>1050</sqft>
   </Vacancy>
</Building>
<Building><!--Should be selected-->
   <Vacancy>
      <sqft>1025</sqft>
   </Vacancy>
   <Vacancy>
      <sqft>1075</sqft>
   </Vacancy>
</Building>
<Building><!--Should this be selected?-->
   <Vacancy>
      <sqft>1050</sqft>
   </Vacancy>
   <Vacancy>
      <sqft>2000</sqft>
   </Vacancy>
</Building>
<Building><!--Should this be selected?-->
   <Vacancy>
      <sqft>900</sqft>
   </Vacancy>
   <Vacancy>
      <sqft>1040</sqft>
   </Vacancy>
</Building>
Dimitre Novatchev
I don't think either of these exactly matches "All buildings with a vacancy between 1000 and 1100 square feet"... or else I'm missing something.
LarsH
@LarsH: One selects all "should" nodes, the other selects these + all "in doubt nodes".
Dimitre Novatchev