views:

132

answers:

1

Take a simple OrderDetail table which has a Quantity and UnitPrice for each record. To get the total value of each Order with SQL it's simple as

SELECT OrderID, SUM(UnitPrice*Quantity)
FROM OrderDetail
GROUP BY OrderID

After converting the table to a XML file, using XQUERY I'm able to get the same information like this

for $orderId in distinct-values(doc('orderDetails.xml')//orderDetails/OrderID)
   let $totalValue := 
      sum(
      for $detail in doc("orderdetails.xml")//OrderDetails[OrderID =$orderId]
         return $detail/Quantity * $detail/UnitPrice
      )
return <order id="{$orderId}" totalValue="{$totalValue}" />

Ignoring the sheer stupidity of getting the information like this, is there a better way to write the XQUERY expression? As it is it's very, very slow (I'm using XMLSpy).

+1  A: 

This is a major flaw in XQuery 1.0, and for this reason a group-by clause has been added in XQuery 1.1, they have added a group by clause, so your query would look like this:

for $orderDetails in doc('orderDetails.xml')//OrderDetails)
let $orderId = $orderDetails/OrderID
let $orderCost = $orderDetails/Quantity * $orderDetails/UnitPrice
group by $orderId
let $totalValue := sum($orderCost)
return <order id="{$orderId}" totalValue="{$totalValue}" />

Unfortunately however, XQuery 1.1 is still just a working draft, and few implementations are available.

Our implementation (XQSharp) tries to spot the pattern you used and do the group by more efficiently (this shows up as a group-by in the query plan). Unfortunately though, our implementation does not spot a group by in your particular case.

The first problem is casing differences ("orderdetails.xml" vs "orderDetails.xml", //orderDetails vs //OrderDetails) - I'll assume these are just typos.

The biggest problem is that what you have written is not a trivial group by!

Unless you are using a schema which indicates otherwise, static analysis cannot determine that each node has exactly one OrderID, and the atomized value of OrderID may have more than one item (if it has a list as its schema type). This means that static analysis of distinct-values(doc('orderDetails.xml')//orderDetails/OrderID) cannot determine that each node has exactly one key.

In order to fix this your query can be written as follows:

for $orderId in distinct-values(doc("orderDetails.xml")/OrderDetails/exactly-one(OrderID/data(.)))
let $totalValue := 
   sum(
      for $detail in doc("orderDetails.xml")/OrderDetails[exactly-one(OrderID/data(.)) = $orderId]
      return $detail/Quantity * $detail/UnitPrice
   )
return <order id="{$orderId}" totalValue="{$totalValue}" />

This query then has the same semantics as a group-by and should be optimized as such. As it happens this still doesn't optimize to a group by in XQSharp, so I have filed this as a bug against our software. Whether XmlSpy performs this optimization or not, I couldn't say.

Oliver Hallam