views:

87

answers:

4

First, I can assume that all urls that end with jpeg, jpg, bmp, png or gif are images, and others aren't.

I thought of, and tried two solutions:

  • Matching the regular expression .(jpe?g|bmp|png|gif)$
  • Using ends-with to check each separately

But, it appears that neither of these exist in XPath 1.0, or at least, they don't exist in Firefox (I am writing a greasemonkey script, so it is only important for the path to work in Firefox).

Sorry about the title, SO didn't think "Find all links to images in XPath 1.0" was unique enough

A: 

It's going to be difficult because XPath does not have an ends-with() function, which you would need to use to check the end of the HREF attribute for your relevant file extensions. You will have to resort to using substring-after().

It's not going to be pretty, something like...

fn:substring-after(a[@href], '.') = 'jpg'

EDIT: It appears that substring-after is not a part of XPath 1.0, so you'll have to use the even uglier substring and string-length

Josh Stodola
http://www.w3.org/TR/xpath/#section-String-Functions would suggest that XPath 1.0 *does* have `substring-after` in its specification
Dancrumb
XPath does of course 1.0 have a `substring-after()` function.
Tomalak
You guys are right, thanks. The answer I linked to (which was accepted) indicated that it did not. Of course I left a comment on there to set the record straight! Thank you!
Josh Stodola
@Josh-Stodola, Where did I say that it hasn't the substring-after() function? XPath 1.0 doesn't allow a function to be specified as a location step, regardles if this is substring-after(), or any other function. Please, revert your incorrect downvote! ! ! ! !
Dimitre Novatchev
@Josh-Stodola, Please, try to understand what you are reading, before downvoting correct answers!
Dimitre Novatchev
@Josh-Stodola, You not only left a comment, indicating that you didn't understand the answer, but you downvoted a correct answer.
Dimitre Novatchev
@Josh-Stodola, Should someone tell you that what you did is unethical and plain wrong?
Dimitre Novatchev
@Dimitry I think you need to take a chill pill. I did not down-vote your answer, but it made it sound like that function does not exist in XPath 1.0 so I left that comment. Please think about what you are going to say to me before making rash assumptions and being rude and childish about it.
Josh Stodola
@Dimitre And regardless, you are still wrong, so maybe I will go down-vote you just for spite (because I can!). This *can* be achieved using a single XPath expression, see the answer to *this* question to find out how.
Josh Stodola
The answer to *which* question?
Dimitre Novatchev
The one you are currently commenting on.
Josh Stodola
+2  A: 

You can use a combination of substring and string-length (both of which are in XPath 1.0) to simulate ends-with. It's not pretty, but it works:

substring(@href, string-length(@href) - 3 + 1, 3) = 'jpg'

(the 3s here are the length of jpg; the 1 is to adjust for substring's 1-based indexing)

should have the same truth value as

ends-with(@href, 'jpg')

I assume from your question that you know how to check for each possible extension separately.

AakashM
+2  A: 

Although you're asking for an XPath solution, an alternative approach would be to use something like jQuery or Prototype, which uses CSS Selectors to select elements.

With jQuery, for instance, you could use:

$("a[href$='jpg'],a[href$='gif'],a[href$='png']").each(functionOfChoice);
Dancrumb
+2  A: 

There are no regular expressions in XPath 1.0, and there is neither a ends-with().

URLs cannot contain spaces without becoming invalid, so you can use them to achieve an end-of-string matching. You could do this:

//a[
  contains(concat(@href, ' '), '.jpg ' or
  contains(concat(@href, ' '), '.bmp ' or
  contains(concat(@href, ' '), '.png '
]

or this (as @AakashM suggests):

//a[
  substring(@href, string-length(@href) - 2, 3) = 'jpg' or
  substring(@href, string-length(@href) - 2, 3) = 'bmp' or
  substring(@href, string-length(@href) - 2, 3) = 'png'
]

I think the latter option would perform a bit better, while the former is shorter and somewhat more pleasing to look at. Depends if you desperately need performance here.

Tomalak