tags:

views:

93

answers:

1

I had a regex as the first line of defense against XSS.

public static function standard_text($str)
{
    // pL matches letters
    // pN matches numbers
    // pZ matches whitespace
    // pPc matches underscores
    // pPd matches dashes
    // pPo matches normal puncuation
    return (bool) preg_match('/^[\pL\pN\pZ\p{Pc}\p{Pd}\p{Po}]++$/uD', (string) $str);
}

It is actually from Kohana 2.3.

This runs on public entered text (no HTML ever), and denies the input if it fails this test. The text is always displayed with htmlspecialchars() (or more specifically, Kohana's flavour, it adds the char set amongst other things). I also put a strip_tags() on output (even though I know it can ruin stuff like: 5 < 3!! :>).

The client had a problem when he wanted to enter some text with parenthesis. I thought about modifying or extending the helper, but I also had a secondary thought - if I allow double quotes, is there really any reason why I need to validate at all?

Can I just rely on the escaping on output?

+3  A: 

It's never secure to rely on Regexes for filtering dangerous XSS attacks. And although you are not relying on them, output escaping and input filtering, when used correctly, will kill all kinds of attacks. Therefore, there is no point in having Regexes as a "first line of defense" when their help isn't really needed. As you and your client have discovered, they only complicate things when used like this.

Long story short: if you use html_entities or htmlspecialchars to escape your output, you don't need regexes nor do you really need strip_tags either.

SimpleCoder