tags:

views:

67

answers:

1

Hi,

I'm trying to learn XPath, and I am having trouble with doing a nested search (using contains).

Specifically, I was given the following question:

There is a list of authors, and a list of books, according to the following dtd:

<!ELEMENT db1 (book*, author*)>
<!ELEMENT book (title)>
<!ATTLIST book
    bid ID #REQUIRED
    authors IDREFS #REQUIRED
>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ATTLIST author
    aid ID #REQUIRED
>

Write an XPath expression that returns the number of authors who wrote books. It is possible to assume that there are no two author ids that contain one another.

I tried many things, but I keep getting an error of "Too many items in contains". I am trying to run something like this:

//author/@aid[contains(//book/@authors/string(.),  string(.))]

I am using the following xml file as an example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE db1 SYSTEM "C:\blabla\db1.dtd">
<db1>
<book authors="a1 a3 a4" bid="b1">
<title>Book 1</title>
</book>
<book authors="a1 a2 a3" bid="b2">
<title>Book 2</title>
</book>
<book authors="a4" bid="b3">
<title>Book 3</title>
</book>
<author aid="a1"></author>
<author aid="a91"></author>
<author aid="a2"></author>
<author aid="a88"></author>
<author aid="a3"></author>
<author aid="a4"></author>
<author aid="a5"></author>
<author aid="a6"></author>

</db1>

The expected answer should be

a1 a2 a3 a4

Any advice?

Thanks.

+2  A: 

I found the answer I was looking for. It is not that difficult actually, it is just necessary to be familiar with the 'id' feature of XPath.

The XPAth query for this is: count(id(//book/@authors))

The list of authors could be given as id(//book/@authors). Notice that this xquery returns the full xml (and not the names only):

<author aid="a1"/>
<author aid="a2"/>
<author aid="a3"/>
<author aid="a4"/>

See reference.

The function contains is not applicable in this case, but luckily, it is also not really necessary.

The id function selects elements by their unique ID. When the argument to id is of type node-set, then the result is the union of the result of applying id to the string-value of each of the nodes in the argument node-set. When the argument to id is of any other type, the argument is converted to a string as if by a call to the string function; the string is split into a whitespace-separated list of tokens (whitespace is any sequence of characters matching the production S); the result is a node-set containing the elements in the same document as the context node that have a unique ID equal to any of the tokens in the list.

Anna
That query does not return the expected result. What's up with that?
Peter Lindqvist
I edited the example a little toclarify what I had in mind. Hope it is less confusing now.
Anna
@Anna glad you figured this out! This wasn't the direction I took it in...but it totally works for me the way you did it. I built a simple test harness in python using libxml2, I could post it if anyone wants to see it...
AJ
It is worth noting that can only work with a ID/IDREFS relation in place, either by DTD or presumably XML Schema. Without that, the expression will return the empty node set. (Oh, and +1 - I didn't know about `id()`)
Tomalak