views:

314

answers:

3

I have a form text field that accepts a url. When the form is submitted, I insert this field into the database with proper anti-sql-injection. My question though is about xss.

This input field is a url and I need to display it again on the page. How do I protect it from xss on the way into the database (I think nothing is needed since I've already taken care of sql injection) and on the way out of the database?

Let's pretend we have it like this, I'm simplifying it, and please don't worry about sql injection. Where do I go from here after that?

$url = $_POST['url'];

Thanks

+8  A: 

Assuming this is going to be put into HTML content (such as between <body> and </body> or between <div> and </div>), you need to encode the 5 special XML characters (&, <, >, ", '), and OWASP recommends including slash (/) as well. The PHP builtin, htmlentities() will do the first part for you, and a simple str_replace() can do the slash:

function makeHTMLSafe($string) {
    $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
    $string = str_replace('/', '&#x2F;', $string);
    return $string;
}

If, however, you're going to be putting the tainted value into an HTML attribute, such as the href= clause of an <a, then you'll need to encode a different set of characters ([space] % * + , - / ; < = > ^ and |)—and you must double-quote your HTML attributes:

function makeHTMLAttributeSafe($string) {
    $scaryCharacters = array(32, 37, 42, 43, 44, 45, 47, 59, 60, 61, 62, 94, 124);
    $translationTable = array();
    foreach ($scaryCharacters as $num) {
        $hex = str_pad(dechex($num), 2, '0', STR_PAD_LEFT);
        $translationTable[chr($num)] = '&#x' . $hex . ';';
    }

    $string = strtr($string, $translationTable);
    return $string;
}

The final concern is illegal UTF-8 characters—when delivered to some browsers, an ill-formed UTF-8 byte sequence can break out of an HTML entity. To protect against this, simply ensure that all the UTF-8 characters you get are valid:

function assertValidUTF8($string) {
    if (strlen($string) AND !preg_match('/^.{1}/us', $string)) {
        die;
    }

    return $string;
}

The u modifier on that regular expression makes it a Unicode matching regex. By matching a single chararchter, ., we're assured that the entire string is valid Unicode.

Since this is all context-dependent, it's best to do any of this encoding at the latest possible moment—just before presenting output to the user. Being in this practice also makes it easy to see any places you've missed.

OWASP provides a great deal of information on their XSS prevention cheat sheet.

Drew Stephens
I've never heard about any special precautions to be taken with html attributes, contra text elements. Do you have any reference/explanation for that?
troelskn
Ah .. To answer my own question, OWASP recommends this because it's needed *if attributes aren't quoted*. I'd recommend quoting attributes instead.
troelskn
+1  A: 

You need to encode it with htmlspecialchars before displaying to a user. Usually this is enough when dealing with data outside of <script> tag and/or HTML tag attributes.

sanmai
+1  A: 

Don't roll your own XSS-protection, there are too many ways something might slip trough (I can't find the link to a certain XSS-demopage anymore, but the amount of possibilities is staggering: Broken IMG-tags, weird attributes etc.).

Use an existing library like sseq-lib or extract one from an established framework.

Update: Here's the XSS-demopage.

christian studer