ansaurus

Question

input is URL, how to protect it from xss

Answer 1

+8 A:

Assuming this is going to be put into HTML content (such as between <body> and </body> or between <div> and </div>), you need to encode the 5 special XML characters (&, <, >, ", '), and OWASP recommends including slash (/) as well. The PHP builtin, htmlentities() will do the first part for you, and a simple str_replace() can do the slash:

function makeHTMLSafe($string) {
    $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
    $string = str_replace('/', '&#x2F;', $string);
    return $string;
}

If, however, you're going to be putting the tainted value into an HTML attribute, such as the href= clause of an <a, then you'll need to encode a different set of characters ([space] % * + , - / ; < = > ^ and |)—and you must double-quote your HTML attributes:

function makeHTMLAttributeSafe($string) {
    $scaryCharacters = array(32, 37, 42, 43, 44, 45, 47, 59, 60, 61, 62, 94, 124);
    $translationTable = array();
    foreach ($scaryCharacters as $num) {
        $hex = str_pad(dechex($num), 2, '0', STR_PAD_LEFT);
        $translationTable[chr($num)] = '&#x' . $hex . ';';
    }

    $string = strtr($string, $translationTable);
    return $string;
}

The final concern is illegal UTF-8 characters—when delivered to some browsers, an ill-formed UTF-8 byte sequence can break out of an HTML entity. To protect against this, simply ensure that all the UTF-8 characters you get are valid:

function assertValidUTF8($string) {
    if (strlen($string) AND !preg_match('/^.{1}/us', $string)) {
        die;
    }

    return $string;
}

The u modifier on that regular expression makes it a Unicode matching regex. By matching a single chararchter, ., we're assured that the entire string is valid Unicode.

Since this is all context-dependent, it's best to do any of this encoding at the latest possible moment—just before presenting output to the user. Being in this practice also makes it easy to see any places you've missed.

OWASP provides a great deal of information on their XSS prevention cheat sheet.

Drew Stephens 2009-11-09 05:45:53

I've never heard about any special precautions to be taken with html attributes, contra text elements. Do you have any reference/explanation for that?

troelskn 2009-11-09 07:40:54

Ah .. To answer my own question, OWASP recommends this because it's needed *if attributes aren't quoted*. I'd recommend quoting attributes instead.

troelskn 2009-11-09 07:45:21

Answer 2

+1 A:

You need to encode it with htmlspecialchars before displaying to a user. Usually this is enough when dealing with data outside of <script> tag and/or HTML tag attributes.

sanmai 2009-11-09 05:51:17

Answer 3

+1 A:

Don't roll your own XSS-protection, there are too many ways something might slip trough (I can't find the link to a certain XSS-demopage anymore, but the amount of possibilities is staggering: Broken IMG-tags, weird attributes etc.).

Use an existing library like sseq-lib or extract one from an established framework.

Update: Here's the XSS-demopage.

christian studer 2009-11-09 06:16:48

ansaurus

tags:

views:

answers:

input is URL, how to protect it from xss

related questions