Let's say i have a full html-document as XML-input.
How would the XSLT-file look if i only want to output the first (or any) image from the html?
views:
151answers:
2The XPath expression the will retrieve the first image from an HTML page: (//img)[1]
.
See the answer from @Dimitre Novatchev
for more information on problems with it.
One XPath expression that selects the first <img>
element in a document is:
(//img)[1]
Do note that a frequent mistake -- as made by @Oded
in his answer is to suggest the following XPath expression -- in general it may select more than one element:
//img[1]
(: WRONG !!! :)
This selects all <img>
elements in the document, each one of which is the first <img>
child of its parent.
Here is the exact explanation of this frequent mistake -- in the W3C XPath 1.0 Recommendation:
NOTE: The location path //para[1]
does not mean the same as the location path /descendant::para[1]
. The latter selects the first descendant para
element; the former selects all descendant para
elements that are the first para children of their parents.
A further problem exists if the document has defined a default namespace, which must be the case with XHTML. XPath treats any unprefixed name as belonging to no namespace and the expression (//img)[1]
selects no node, because there is no element in the document that belongs to no namespace and has name img
.
In this case there are two ways to specify the wanted XPath expression:
(//x:img)[1]
-- where the prefixx
is associated (by the hosting language) with the specific default namespcae (in this case this is the XHTML namespace).(//*[name()='img'])[1]