views:

534

answers:

4

I am working on a ASP.NET response filter that rewrites URL's to point to a different domain in specific situations.

Because ASP.NET chunks the response writes, my filter gets called several times before the page is fully streamed. This means that I need to be careful that each call to Regex.Replace doesn't double replace a url (You end up with http://foo.comhttp://foo.com/path).

To do this, I'm trying to use a negative lookbehind expression for the replace, but it doesn't seem to be working:

    content = Regex.Replace(content,"((?<!" + newDomain + ")" + match + ")", newDomain + match);

This creates a regex like:

 ((?<!http://www.foo.com/)actual/url)

However, it seems to not respect the look behind and I am getting everything double replaced.

Any ideas?

EDIT: This regex works great when I use a tool like Regex Coach to test it against sample data.

EDIT 2: Added the slash, it is actually there.

+1  A: 

A couple of thoughts:

  • Do you need to escape the . in the regex? I don't know the <! syntax and don't have my books to hand so this may be a moot point.
  • I don't see how it would match http://www.foo.com/something as there is no / after the www.foo.com in your example.

Hope some of that is of help.

DeletedAccount
A: 

I would try this

content = Regex.Replace(content,"(?<!" + newDomain + ")^[^/]+/(?=" + match + ")", newDomain + match);

This will match (and thus replace the domain part on the expression) only is the domain is not newDomain and the path is match.

Simeon Pilgrim
why the down vote? does it not solve the problem, if so, please explain why? We are not psychic debuggers
Simeon Pilgrim
A: 

Maybe I'm missing something, but should you be using negative lookbehinds at all? A lookbehind, by nature, will not match anything. Whereas you are wanting to match the domain and the path, and then replace the domain. Right?

So it should be something more like this:

Regex.Replace("http://www.foo.com/something", "(http://www.foo.com/)(something)", "http://www.abc.com/$2")

The idea is to use grouping to your advantage. That's where the $2 part will grab the second half of the match (the path) and append it to the new domain. I tested this in Regex Hero (a .NET regex tester) and it works. By the way, The Regex Coach is Perl-based and you may run into some difference when comparing to the .NET regex engine.

Steve Wortham
+1  A: 

I will try a third angle.

I think you are confusing that fact your regex "matches" something in regex coach, with it matching the part you want. Therefore you are surprised by the replace results.

the replace swaps all the matched input for the new token.

the negative lookbehind makes sure the pattern is not present, but the pattern is not part of the matched input.

the results you are getting is because only the path (your match string) of your URL is the matched input and you are replacing this with the newDomain variable.

That is why you are getting the results you are getting.

Simeon Pilgrim