tags:

views:

306

answers:

5

Should HtmlEncode() be abandoned and Replace() used instead of I want to parse links in posts/comments (with regular expressions)? HtmlEncode() replaces & with &amp; which I assume can cause problems with links, should I just use Replace() to replace < with &lt;?

For example if a user posts something like:
See this site http://www.somesite.com/somepage.aspx?qs1=1&amp;qs2=2&amp;qs3=3

I want it to be:
See this site <a href="http://www.somesite.com/somepage.aspx?qs1=1&amp;qs2=2&amp;qs3=3"&gt;http://www.somesite.com/somepage.aspx?qs1=1&amp;qs2=2&amp;qs3=3&lt;/a&amp;gt;

But With HtmlEncode() the URL will become (notice the ampersand):
See this site http://www.somesite.com/somepage.aspx?qs1=1&amp;amp;qs2=2&amp;amp;qs3=3

Should I avoid the problem by using Replace() instead?

Thanks

A: 

What are you looking to replace and why? HtmlEncode() is typically used to sanitize user-supplied data. That said, if you're allowing users to submit links, you probably don't want to HtmlEncode them, in the first place. You're basically going to render them exactly as the user supplied them.

senfo
+1  A: 

Perhaps you are looking for UrlEncode()? http://msdn.microsoft.com/en-us/library/zttxte6w.aspx

Matt
Only if he wants to include the user-entered text as a querystring parameter...
Shog9
@shog9: Why only user-entered?
AnthonyWJones
@AnthonyWJones: that was the subject of the question. Of course, it could also be used for other text (that isn't already encoded) - but regardless, it is only useful for encoding bits of text for querystrings, not entire URLs to be used as URLs.
Shog9
+4  A: 

Actually, your last example - the one you're worried about - is the only correct one. In HTML documents, ampersands are used to introduce entity references, and therefore must be escaped. While most browsers are forgiving enough to let them slip through when not obviously part of an entity reference, you can run into subtle problems should their use in a URL happen to look like an entity.

Let HtmlEncode() do its job.

Shog9
A: 

Replacing & with &amp; inside of an href attribute is correct. If you do not, then your code is technically invalid. Also, you should escape it even if it's inside of a link. The only case you'll run into problems is if you end up HTMLEncoding it multiple times.

Yuliy
A: 

I recommend against using Replace to do the job of HTMLEncode or URLEncode. These functions are custom designed to take care of most of the problems that you'd see in user entered content and if you try to replace them with your own code, the results might get ugly (I am talking from experience here) if you forgot something vital.

Cyril Gupta