views:

311

answers:

1

So I have this regex:

&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)

That matches all &'s in a block of text

However, if I have this string:

& & & & & <a href="http://localhost/MyFile.aspx?mything=2&amp;this=4"&gt;My Text &</a>
---------------------------------------------------------^

... the marked & also get's targeted - and as I'm using it to replace the &'s with & the url then becomes invalid:

http://localhost/MyFile.aspx?mything=2&amp;amp;this=4

D'oh! Does anyone know of a better way of encoding &'s that are not in a url.

+4  A: 

No, the URL does not become invalid. The HTML code becomes:

<a href="http://localhost/MyFile.aspx?mything=2&amp;amp;this=4"&gt;

This means that the code that was not correctly encoded now is correctly encoded, and the actual URL that the link contains is:

http://localhost/MyFile.aspx?mything=2&amp;this=4

So, it's not a problem that the & character in the code gets encoded, on the contrary the code is now correct.

Guffa
David Dorward
cletus
David Dorward
Guffa