views:

230

answers:

4

Say we have a form where the user types in various info. We validate the info, and find that something is wrong. A field is missing, invalid email, et cetera.

When displaying the form to the user again I of course don't want him to have to type in everything again so I want to populate the input fields. Is it safe to do this without sanitization? If not, what is the minimum sanitization that should be done first?

And to clearify: It would of course be sanitized before being for example added to a database or displayed elsewhere on the site.

+8  A: 

No it isn't. The user might be directed to the form from a third party site, or simply enter data (innocently) that would break the HTML.

Convert any character with special meaning to its HTML entity.

i.e. & to &amp;, < to &lt;, > to &gt; and " to &quot; (assuming you delimit your attribute values using " and not '.

In Perl use HTML::Entities, in TT use the html filter, in PHP use htmlspecialchars. Otherwise look for something similar in the language you are using.

David Dorward
While you're correct, the OP didn't state context, and it's only in the context of the data being entered being able to be seen by *someone else*, (or being able to auto-populate it and redirect an existing/logged in, user to your page). Without this, you're just doing XSS in yourself, which is pretty useless.
Noon Silk
An attacker could produce a link, that simulates sending the form (GET-params), so an XSS-attack to someone else could be done.
Mnementh
Be careful using these functions that attempt to provide a one-stop-shop for solving all injection problems. Not one of them i've seen in any language works in all areas of a HTML page. (or at least without courrupting data). Applying only PHP htmlspecialchars() mentioned here certainly won't keep you safe in an attribute value.
Cheekysoft
@Cheekysoft: How so? If you encode all the special characters, then what could happen? AFAIK the browser will display everything fine, there's no way to "break out" of the attribute/tag.
DisgruntledGoat
htmlspecialchars() *will* keep you safe in an attribute value unless you delimit attribute values with `'` and don't set change `ENT_COMPAT` to `ENT_QUOTES`
David Dorward
Ooooh. Having read the article it looks like there is a "if you don't quote your attribute values" rider implied on that. You should be safe if you put quotes around your attribute values (which is good style, required in XHTML, and required in HTML if you use most of the characters marked as dangerous).
David Dorward
@DisgruntledGoat yes, as david said, htmlspecialchars won't encode single quotes by default, so an attacker can escape the attribute value in <x x='inject' /> In addition, it won't encode colons and parentheses. imagine <img src="inject" /> then inject "javascript:alert(document.cookie)"
Cheekysoft
Bear in mind, we are not restricting the handling of the character encoding, so additional multi-byte character attacks can be tried. I find this a good approach $str =mb_convert_encoding($str, ‘UTF-8′, ‘UTF-8′); $str = htmlentities($str, ENT_QUOTES, ‘UTF-8′); Sadly Even this leaves IE6 vulnerable, because of the way it handles UTF. However, you could fall back to a more limited encoding, such as ISO-8859-1, until IE6 usage drops off.
Cheekysoft
A: 

Yes, it's safe, provided of course that you encode the value properly.

A value that is placed inside an attribute in an HTML needs to be HTML encoded. The server side platform that you are using should have methods for this. In ASP.NET for example there is a Server.HtmlEncode method, and the TextBox control will automatically HTML encode the value that you put in the Text property.

Guffa
-1 it is not safe.
Rook
@Rook: That is incorrect. It's safe, given the circumstances that I stated in the same sentence. If you can't be bothered to read the entire answer, you should at least read the entire first sentence before downvoting.
Guffa
"provided of course that you encode the value properly." was indeed the key here. Admittedly, thats a large and complicated proviso, but it is the important point. The important phase here is output-encoding and not input-validation. Whilst input-validation is, of course, highly desirable; it is important to understand that the output-encoding step is what protects from XSS.
Cheekysoft
+1  A: 

It is not safe, because, if someone can force the user to submit specific data to your form, you will output it and it will be "executed" by the browser. For instance, if the user is forced to submit '/><meta http-equiv="refresh" content="0;http://verybadsite.org" />, as a result an unwanted redirection will occur.

naivists
If someone is forcing me to enter that into a browser and submit it, then what the browser does next is the least of my worries.
Paul Ruane
@Paul: yeah, I thought of http://imgs.xkcd.com/comics/security.png
Adriano Varoli Piazza
Kind of a bad example, since the form submitting to the OP's login form would havr to be on a separate site. So the attacker would just redirect them to verybadsite.org straight off, there's no advantage from going through *another* site.
DisgruntledGoat
The example I had in mind when I wrote this post, is a scam e-mail, containing a link to (apparently friendly) @Svish's web site, but actually bringing you to the verybadsite.com. However, in this case the intermediary website has to be configured to accept GET data as POST (as I'm not sure that most e-mail clients let you submit POST data).
naivists
+1  A: 

You cannot insert user-provided data into an HTML document without encoding it first. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data-values and never as HTML markup or Javascript code. Attacks against this mechanism are commonly known as "cross-site scripting", or simply "XSS".

If inserting into an HTML attribute value, then you must ensure that the string cannot cause the attribute value to end prematurely. You must also,of course, ensure that the tag itself cannot be ended. You can acheive this by HTML-encoding any chars that are not guaranteed to be safe.

If you write HTML so that the value of the tag's attribute appears inside a pair of double-quote or single-quote characters then you only need to ensure that you html-encode the quote character you chose to use. If you are not correctly quoting your attributes as described above, then you need to worry about many more characters including whitespace, symbols, punctuation and other ascii control chars. Although, to be honest, its arguably safest to encode these non-alphanumeric chars anyway.

Remember that an HTML attribute value may appear in 3 different syntactical contexts:

Double-quoted attribute value

<input type="text" value="**insert-here**" />

You only need to encode the double quote character to a suitable HTML-safe value such as &quot;

Single-quoted attribute value

<input type='text' value='**insert-here**' />

You only need to encode the single quote character to a suitable HTML-safe value such as &#145;

Unquoted attribute value

<input type='text' value=**insert-here** />

You shouldn't ever have an html tag attribute value without quotes, but sometimes this is out of your control. In this case, we really need to worry about whitespace, punctuation and other control characters, as these will break us out of the attribute value.

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and | (and more). [para lifted from OWASP]

Please remember that the above rules only apply to control injection when inserting into an HTML attribute value. Within other areas of the page, other rules apply.

Please see the XSS prevention cheat sheet at OWASP for more information

Cheekysoft
so in PHP it would be enough with `htmlspecialchars` as long as you have your attribute values enclosed using double quotes? Like: `name="value"`.
Svish
@Swish in the exact circumstance you indicate, yes. ...until we start worrying about multibyte character attacks and malformed multibyte character attacks - but that's a bit complicated for here
Cheekysoft