views:

290

answers:

3

When should I HTML-escape data in my code and when should I URL-escape? I am confused about which one when to use...

For example, given a element which asks for an URL:

<input type="text" value="DATA" name="URL">

Should I HTML-Escape DATA here or URL-escape it here?

And what about an element:

<a href="URL" title="URL">NAME</a>

Should URL be URL-escaped or HTML-escaped? What about NAME?

Thanks, Boda Cydo.

+2  A: 

HTML Escape when you're writing anything to a HTML document.

URL Escape when you're constructing a URL to call in-code, or for a browser to call (i.e. in the href tag).

In your examples you'll want to 'Attribute' escape the attributes. (I can't remember the exact function name, but it's in HttpUtility).

Noon Silk
What is `HttpUtility`?
bodacydo
bodacydo: Sorry, I assumed you were using .NET. `HttpUtility` is a class in the `System.Web` namespace in .NET; if you mention what language someone may be able to provide a library/class that can help with the encoding.
Noon Silk
I am using Python. I already found `cgi.escape` and `urllib.quote_plus` functions and now I am still trying to understand which ones to use. The guy below suggests to do both url-escaping and html-escaping at the same time...
bodacydo
+1  A: 

In the examples you show, it should be first URL-escaped, then HTML-escaped:

<a href="http://www.example.com?arg1=this%2C+that&amp;amp;arg2=blah"&gt;
Max Shawabkeh
Why both? I don't quite understand :(
bodacydo
Because it's a URL inside HTML. To be a valid URL, it has to contain only characters allowed in URLs, with invalid ones escaped. However, since to the HTML, it's simply a text value, it has to be escaped for HTML too.
Max Shawabkeh
What would happen if I only HTML-escaped it?
bodacydo
Most (probably all) browsers won't mind even if it's not escaped at all. A validator will protest if it's not HTML-escaped (regardless of whether it's URL-escaped). In practice, nothing will break if you don't URL-escape it, but that's the right way to handle it.
Max Shawabkeh
Tor Valamo
Thank you! I now understand everything!
bodacydo
+1  A: 

URL encoding ensures that special characters such as ? and & don't cause the URL to be misinterpreted on the receiving end. In practice, this means you'll need to URL encode any dynamic query string values that have a chance of containing such characters.

HTML encoding ensures that special characters such as > and " don't cause the browser the misinterpret the markup. Therefore you need to HTML encode any values outputted into the markup that might contain such characters.

So in your example:

  • DATA needs to be HTML encoded.
  • Any dynamic segments of URL will need to be URL encoded, then the whole string will need to be HTML encoded.
  • Name needs to be HTML encoded.
Nick Higgs
Your answer was the clearest. Understood everything without asking extra questions!
bodacydo