views:

727

answers:

4

When outputting user input, do you only use htmlspecialchars() or are there are functions/actions/methods you also run? I'm looking for something that will also deal with XSS.

I'm wondering if I should write a function that escapes user input on output or just use htmlspecialchars(). I'm looking for the generic cases, not the specific cases that can be dealt with individually.

+7  A: 

I usually use

htmlspecialchars($var, ENT_QUOTES)

on input fields. I created a method that does this because i use that a lot and it makes the code shorter and more readable.

Ólafur Waage
Why the use of `ENT_QUOTES` (rather than `ENT_NOQUOTES` as suggested by me)?
Konrad Rudolph
I use ENT_QUOTES for input fields with data from the database. So if the data there has ' or " it will not close the value variable within the input tag.
Ólafur Waage
Ólafur: yes, on input fields (or more generally attributes) it makes a certain sense. ;-)
Konrad Rudolph
+1. Defining your own function to do echo(htmlspecialchars()) with a nice short name takes the pain out of <?php-style literal output. In this case you'll use it for all your output so the ENT_QUOTES is necessary for the off-chance you use it in a single-quote-delimited attribute value.
bobince
Heres how people hack yours site when you use htmlspecialchars.http://ha.ckers.org/blog/20070327/htmlspecialchars-strikes-again/
Syntax
A: 

Basically, any output that has been escaped in this manner can no longer directly execute/inject code into the page. htmlspecialchars does this for you, no need for custom functions. However, notice that it sometimes does too much, i.e. it escapes some things that need not be escaped in all instances. I therefore advise using the ENT_NOQUOTES argument. This results in slightly shorter HTML code more closely resembling code that would be written manually: people will not usually write &quot; instead of ", since this isn't required.

Compare:

<pre>string s = &quot;Hello world&quot;;</pre>

vs.

<pre>string s = "Hello world";</pre>

The exception is of course when the output is used in tag attributes, where a runaway quote would demolish the markup.

Konrad Rudolph
Wouldn't you want to escape the quotes if you are using it in an input field: <input type="text" value="blah blah "blah blah" /> would break things
Darryl Hein
Darryl: You didn't mention the input field. Basically, that falls under my explicit mention of “unless you really need to escape quote characters.”
Konrad Rudolph
Would there be a reason not to escape the quotes?
Darryl Hein
No, there would not. Escaped quote *always* act as a literal quote character and nothing else; there is no possibility of ‘escaping too much’. You might as well use ENT_QUOTES everywhere for consistency.
bobince
bobince: call me picky but this results in ugly HTML code. “too much” was of course meant aesthetically rather than functionally. Apparently quite a lot of people got this wrong. Too bad.
Konrad Rudolph
I've rewritten the questionable passage. FWIW, this doesn't change the contents nor the correctness of the text.
Konrad Rudolph
This code can seriously screw up your site. It can be very dangerous, you SHOULD NEVER use this kind of thing especially when you put it in a function. It might look nicer, but it's an ugly bomb waiting to go off.
Christian Sciberras
@Christian: care to elaborate? How can this screw up the code, given that you as the developer know the *context* of the escaped strings? To make this perfectly clear, *unless* the escaped output is intended for use in an attribute, **this code isn’t dangerous at all**. Applying redundant, nonsensical checks “just to be sure” is cargo-cult programming.
Konrad Rudolph
Konrad, let me elaborate on two quotes of yours; "given that you as the developer know the context of the escaped strings" ... "unless the escaped output is intended for use in an attribute" (cf assumptions).In my own personal opinion, I don't sacrifice security issues for the sake of "nice looking code" especially when said code is being generated.It is dangerous for the fact that anyone seeing that code falls for the false sense of security, especially if you're using it a lot. I repeat again, you should NOT use that. Also, you should set the encoding (arg) for optimal security (v BOFs).
Christian Sciberras
In short, "echo '<h1>'.htmlspecialchars($_REQUEST['title']).'</h1>';" IS safe. But this is not: "function nohtml($str){ return htmlspecialchars($str); }"I'm not talking in the sense of practical safety, but maintainability. I don't see it a problem ensuring proper encoding (since, by the way, w3c also recommends this).
Christian Sciberras
+4  A: 

Lets have a quick review of WHY escaping is needed in different contexts:

If you are in a quote delimited string, you need to be able to escape the quotes. If you are in xml, then you need to separate "content" from "markup" If you are in SQL, you need to separate "commands" from "data" If you are on the command line, you need to separate "commands" from "data"

This is a really basic aspect of computing in general. Because the syntax that delimits data can occur IN THE DATA, there needs to be a way to differentiate the DATA from the SYNTAX, hence, escaping.

In web programming, the common escaping cases are: 1. Outputting text into HTML 2. Outputting data into HTML attributes 3. Outputting HTML into HTML 4. Inserting data into Javascript 5. Inserting data into SQL 6. Inserting data into a shell command

Each one has a different security implications if handled incorrectly. THIS IS REALLY IMPORTANT! Let's review this in the context of PHP:

  1. Text into HTML: htmlspecialchars(...)

  2. Data into HTML attributes htmlspecialchars(..., ENT_QUOTES)

  3. HTML into HTML Use a library such as HTMLPurifier to ENSURE that only valid tags are present.

  4. Data into Javascript I prefer json_encode. If you are placing it in an attribute, you still need to use #2, such as

  5. Inserting data into SQL Each driver has an escape() function of some sort. It is best. If you are running in a normal latin1 character set, addslashes(...) is suitable. Don't forget the quotes AROUND the addslashes() call:

    "INSERT INTO table1 SET field1 = '" . addslashes($data) . "'"

  6. Data on the command line escapeshellarg() and escapeshellcmd() -- read the manual

-- Take these to heart, and you will eliminate 95%* of common web security risks! (* a guess)

gahooa
A: 

You shouldn't be cleansing text on output, it should happen on input. I use a filter that filters all input to the application. It is configurable so that it can allow specific tags/data through when needed (say for a wysiwig editor).

You should do as little processing of text on output as possible so that you ensure speed. Processing it once creates a lot less strain then processing it 500,0000 times.

Syntax
I would have agreed with you, except that generally the theory is to only accept input that is valid, but then you still need to escape output. The problem comes in when someone enter's a quote in their username, you escape it for HTML it becomes useless for PDF generation.
Darryl Hein
Heres the thing though, if you are programming this for the web, how many times will it be viewed on the web verse exported as a PDF. I assume you aren't generating PDF's on the fly on every view. So my point stands strong, in the case of a PHP export you can simply reverse the output.
Syntax
Well, the rule still stands that you want to store the users data as exact as possible in the database and on output modify it as needed. I work on some sites where 50% of web/HTML and 50% is PDF/Excel/etc.
Darryl Hein
Those sound more like web applications and not web sites. If I was to decode all output when its called on my site it would create a stupid amount of overhead. Very few websites on the face of the planet need to support xhtml, pdf, and excel at the same time and output to each. Thats not standard
Syntax
1. Filter input 2. Escape output http://shiflett.org/blog/2005/feb/my-top-two-php-security-practices
ejunker