views:

653

answers:

4

When do you call Microsoft.Security.Application.AntiXss.HtmlEncode? Do you do it when the user submits the information or do you do when you're displaying the information?

How about for basic stuff like First Name, Last Name, City, State, Zip?

+3  A: 

You should only encode or escape your data at the last possible moment, whether that's directly before you put it in the database, or display it on the screen. If you encode too soon, you run the risk of accidentally double encoding (you'll often see & on newbies' websites - myself included).

If you do want to encode sooner than that, then take measures to avoid the double encoding. Joel wrote an article about good uses for hungarian notation, where he advocated use of prefixes to determine what is stored in the variable. eg: "us" for unsafe string, "ss" for safe string.

usFirstName = getUserInput('firstName')

ssFirstName = cleanString(usFirstName);

Also note that it doesn't matter what the type of information is (city, zip code, etc) - leaving any of these unchecked is asking for trouble.

nickf
`us` stands for unsafe and `ss` stands for `supersafe`? :D
LiraNuna
+10  A: 

You do it when you are displaying the information. Preserve the original as it was entered, convert it for display on a web page. Let's say you were displaying it in some other way, like exporting it into Excel. In that case, you'd want to export the preserved original.

Encode every single string.

Corey Trager
+2  A: 

It depends on your situation. Where I work, for years the company did no HTML encoding, so when we started doing it, it would have been almost impossible to find every location within the system that user input could be displayed on the page.

Instead we chose to sanitize input on its way into the system since there were fewer input points than output points. We sanitize immediately before inputting data into the DB, although we don't use Microsoft's AntiXss library, we use a set of homebrew methods that whitelist ranges of HTML tags and characters depending on the type of input.

If you're designing the system from scratch, or you have a system that is small (or managed well) enough to encode output, follow Corey's suggestion. It's definitely the better way to do it.

Dan Herbert
+1  A: 

Encoding is not a property of the data, it is a property of the transport mechanism. Therefore you should unencode data when you receive it, and encode it appropriately before transmission. The transport mechanism determines what sort of encoding is necessary.

This principle holds true whether your transport mechanism is HTML, HTTP, smoke signals, etc. The trick is knowing how to do the types of encoding manually, and when various frameworks do the steps for you automagically. For instance, ASP.NET will encode data assigned to a System.Web.UI.WebControls.Button's Text, but not text assigned to a System.Web.UI.WebControls.Literal's Text. jQuery will encode content you set with .innerText(), but not content you set with .innerHtml().

Frank Schwieterman