You need to use capturing parenthesis for sub-expression matches:
match(/\href=".*?(\/pdf\/.*?\.pdf)/)[1];
Match will return an array with the entire match at index 0, all sub expression captures will be added to the array in the order they matched. In this case, index 1
contains the section matching \/pdf\/.*?\.pdf
.
Try and make your regex more specific than just
.*?
if it's matching too broadly. For instance:
match(/\href="([^"]+?\/pdf\/[^\.]+?\.pdf)"/)[1];
[^"]+?
will lazily match a string of characters that doesn't contain the double quote character. This will limit the match to staying within the quotes, so the match won't be too broad in the following string, for instance:
<a href="someurl/somepage.html">Test</a><a href="dir/pdf/file.pdf">Some PDF</a>