tags:

views:

42

answers:

1

I need to do a non greedy match and hope someone can help me. I have the following, and I am using JavaScript and ASP

match(/\href=".*?\/pdf\/.*?\.pdf/)

The above match, matches the first start of an href tag. I need it to only match the last href that is part of the /pdf/ folder.

any ideas ?

+1  A: 

You need to use capturing parenthesis for sub-expression matches:

match(/\href=".*?(\/pdf\/.*?\.pdf)/)[1]; 

Match will return an array with the entire match at index 0, all sub expression captures will be added to the array in the order they matched. In this case, index 1 contains the section matching \/pdf\/.*?\.pdf.


Try and make your regex more specific than just .*? if it's matching too broadly. For instance:

match(/\href="([^"]+?\/pdf\/[^\.]+?\.pdf)"/)[1];

[^"]+? will lazily match a string of characters that doesn't contain the double quote character. This will limit the match to staying within the quotes, so the match won't be too broad in the following string, for instance:

<a href="someurl/somepage.html">Test</a><a href="dir/pdf/file.pdf">Some PDF</a>
Andy E
This gives me /pdf/filename.pdf, but I need to get <a href="somedirecrories/pdf/filename.pdf" but I am stripping the link from html code and there is references to other href higher up in the document - So I figure I need some sort of pattern match
Gerald Ferreira
@Gerald: Sorry, I didn't realize that is what you meant. I've updated my answer.
Andy E
Perfect exactly what I were looking for - thanks Andy
Gerald Ferreira