tags:

views:

40

answers:

2

I'm using the following regular expression

<a href="[^/]

to find all links which do not start with a slash. I want to use the result of this regex to replace all <a href="somelink.html"> tags with something like <a href="http://mysite.com/somelink.html"&gt;.

But the problem with my regular expression is that (in the above example) the string <a href="s gets replaced instead of <a href=".

How can I fix this regular expression to avoid including the last character in my match?

I'm using the .Net Regex library for this. Currently with the following code:

content = Regex.Replace(content, "(<a href=\")[^/]", "<a href=\"http://mysite.com/");

Maybe I should change something there? But I'd rather have a good regular expression if possible instead of starting to play around with SubString etc.

+2  A: 

Don't use regex to parse HTML. Use HTML Agility Pack. It will make your life easier.

If you insist on using regex, try a negative lookahead:

<a href="(?!/)
Mark Byers
Obligatory reference: http://stackoverflow.com/questions/1732454 :-P
Chris Jester-Young
A: 

If you have to use a regex, find a reference in the manual to look-ahead assertion, or equivalent. In Perl it is (?=pattern), so your patter becomes

  <a href="(?=[^/])

It will match if the pattern is followed by [^/], without including it in the match.

Dan Andreatta