tags:

views:

37

answers:

2

I don't know exactly why the xpath expression:

//h3[text()='Foo › Bar']

doesn't match:

<h3>Foo &rsaquo; Bar</h3>

Does that seem right? How do I query for that markup?

+1  A: 

From the XPath specification:

XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax

… so unless you are using the query inside (as opposed to "to query") a language that resolves that entity (perhaps XSLT with a DTD that includes the entity (if that is possible, I'm far from an XSLT expert)), I wouldn't expect it to work.

Use a literal character or an escape sequence recognized by whatever language you are using XPath from.

David Dorward
+2  A: 

XPath does not define any special escape sequences. When XPath is used within XSLT (e.g. in attributes of elements of an XSLT document), the escape sequences are processed by the XML processor that reads the stylesheet. If you use XPath in non-XML context (e.g. from Java or C# or other language) via a library, and your XPath query is a string literal in that language, you won't get any escape processing aside from that which the language itself usually does.

If this is C# or Java, this should work:

String xpath = "//h3[text()='Foo \u8250 Bar']";
...

As a side note, it wouldn't work in XSLT either, as XSLT uses XML, which doesn't define a character entity &rsaquo; - it only defines &lt;, &gt;, &quot;, &apos; and &amp;. You'd have to either use &#x8250;, or define the character entity yourself in DOCTYPE declaration of the XSLT stylesheet.

Pavel Minaev