views:

81

answers:

3

Hi, I am trying to write a regexp that removes file paths from links and images.

href="path/path/file" to href="file"
href="/file" to href="file"
src="/path/file" to src="file"

and so on...

I thought that I had it working, but it messes up if there are two paths in the string it is working on. I think my expression is too greedy. It finds the very last file in the entire string.

This is my code that shows the expression messing up on the test input:

<script type="text/javascript" src="/javascripts/jquery.js"></script>
<script type="text/javascript">
    $(document).ready(function(){
        var s = '<a href="one/keepthis"><img src="/one/two/keep.this"></a>';
        var t = s.replace(/(src|href)=("|').*\/(.*)\2/gi,"$1=$2$3$2");
        alert(t);
    });
</script>

It gives the output:

<a href="keep.this"></a>

The correct output should be:

<a href="keepthis"><img src="keep.this"></a>

Thanks for any tips!

A: 

I would suggest run separate regex replacement, one for a links and another for img, easier and clearer, thus more maintainable.

Zhang Yining
Thanks, I might have to do this. I'm testing a new version that appears to work a little better:/(src|href)="([^"]*\/)*\/?([^"]*)"/gi,"$1=\"$3\""I'll have to run it through tests, and make it work with " or '
Moasely
A: 

This seems to work in case anyone else has the problem:

var t = s.replace(/(src|href)=('|")([^ \2]*\/)*\/?([^ \2]*)\2/gi,"$1=$2$4$2");
Moasely
A: 

Try adding ? to make the * quantifiers non-greedy. You want them to stop matching when they encounter the ending quote character. The greedy versions will barrel right on past the ending quote if there's another quote later in the string, finding the longest possible match; the non-greedy ones will find the shortest possible match.

/(src|href)=("|').*?\/([^/]*?)\2/gi

Also I changed the second .* to [^/]* to allow the first .* to still match the full path now that it's non-greedy.

John Kugelman