views:

91

answers:

2

I have a RegEx which nicely finds the href's in a URL:

<[aA][^>]*? href=[\"'](?<url>[^\"]+?)[\"'][^>]*?>

However, I want it to NOT find any href that contains the text, 'javascript:' in it.

The reason is that I sometimes need to mod the href and sometimes don't. When there is a 'javascript:' text in the href I want it not to be found by the regex.

(ASP.NET, C#)

+2  A: 

I really wouldn't recommend using a regexp for this, since HTML isn't regular and there are no end of edge cases to cater for. If at all possible, please use an HTML parser. I think you'll find it a lot less grief.

Brian Agnew
It works in this case (http://BiblePro.BibleOcean.com) because I control the html structure. It's a single page AJAX app.
BahaiResearch.com
A: 

A word javascript can be written in other ways. Look at ha.ckers.org article. Simple excluding javascript word dot't provide you safety at all.

hsz