tags:

views:

151

answers:

2

Let's say i have a full html-document as XML-input.
How would the XSLT-file look if i only want to output the first (or any) image from the html?

+3  A: 

The XPath expression the will retrieve the first image from an HTML page: (//img)[1].

See the answer from @Dimitre Novatchev for more information on problems with it.

Oded
See my answer for an explanation of the issue with your answer. Read the XPath spec -- the definition of the `//` abbreviation, and search for this frequent mistake. If the problem is still not clear, ask a separate question and many people will be glad to explain. :)
Dimitre Novatchev
@Oded: **This answer is W R O N G ** !!!!! See my answer for explanation.
Dimitre Novatchev
I definitely don't want to read the XPath spec. But I see what Dimitre is saying. The `//img[1]` will select the first img tag of *any* parent rather than the first of the document.
Jweede
@Dimitre Novatchev - thanks for the correction. Answer updated.
Oded
Glad to see. :) I cancelled my downvote.
Dimitre Novatchev
+3  A: 

One XPath expression that selects the first <img> element in a document is:

(//img)[1]

Do note that a frequent mistake -- as made by @Oded in his answer is to suggest the following XPath expression -- in general it may select more than one element:

//img[1] (: WRONG !!! :)

This selects all <img> elements in the document, each one of which is the first <img> child of its parent.

Here is the exact explanation of this frequent mistake -- in the W3C XPath 1.0 Recommendation:

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

A further problem exists if the document has defined a default namespace, which must be the case with XHTML. XPath treats any unprefixed name as belonging to no namespace and the expression (//img)[1] selects no node, because there is no element in the document that belongs to no namespace and has name img.

In this case there are two ways to specify the wanted XPath expression:

  1. (//x:img)[1] -- where the prefix x is associated (by the hosting language) with the specific default namespcae (in this case this is the XHTML namespace).

  2. (//*[name()='img'])[1]

Dimitre Novatchev
Is your 'wrong' example missing a `[1]` ?
AakashM
@AakashM: Thanks, corrected.
Dimitre Novatchev