views:

148

answers:

4

In my php web app, suppose I want to go the extra mile and in addition to going gangbusters and being anal-retentive about sanitizing my inputs, I also want to ensure that no javascript is being output in strings I am inserting into templated html.

Is there a standard way to make sure I don't put javascript in the generated html content?

A: 

not exactly a standard way; because what if you were doing: <img src="${path}">, and ${path} expanded to http://p0wned.com/jpg.jpg" /><script src="p0wned.com/js.js"/>

Anyway I like this regular expression:

#from http://www.perlmonks.org/?node_id=161281
sub untag {
  local $_ = $_[0] || $_;
# ALGORITHM:
#   find < ,
#       comment <!-- ... -->,
#       or comment <? ... ?> ,
#       or one of the start tags which require correspond
#           end tag plus all to end tag
#       or if \s or ="
#           then skip to next "
#           else [^>]
#   >
  s{
    <               # open tag
    (?:             # open group (A)
      (!--) |       #   comment (1) or
      (\?) |        #   another comment (2) or
      (?i:          #   open group (B) for /i
        ( TITLE  |  #     one of start tags
          SCRIPT |  #     for which
          APPLET |  #     must be skipped
          OBJECT |  #     all content
          STYLE     #     to correspond
        )           #     end tag (3)
      ) |           #   close group (B), or
      ([!/A-Za-z])  #   one of these chars, remember in (4)
    )               # close group (A)
    (?(4)           # if previous case is (4)
      (?:           #   open group (C)
        (?!         #     and next is not : (D)
          [\s=]     #       \s or "="
          ["`']     #       with open quotes
        )           #     close (D)
        [^>] |      #     and not close tag or
        [\s=]       #     \s or "=" with
        `[^`]*` |   #     something in quotes ` or
        [\s=]       #     \s or "=" with
        '[^']*' |   #     something in quotes ' or
        [\s=]       #     \s or "=" with
        "[^"]*"     #     something in quotes "
      )*            #   repeat (C) 0 or more times
    |               # else (if previous case is not (4))
      .*?           #   minimum of any chars
    )               # end if previous char is (4)
    (?(1)           # if comment (1)
      (?<=--)       #   wait for "--"
    )               # end if comment (1)
    (?(2)           # if another comment (2)
      (?<=\?)       #   wait for "?"
    )               # end if another comment (2)
    (?(3)           # if one of tags-containers (3)
      </            #   wait for end
      (?i:\3)       #   of this tag
      (?:\s[^>]*)?  #   skip junk to ">"
    )               # end if (3)
    >               # tag closed
   }{}gsx;          # STRIP THIS TAG
  return $_ ? $_ : "";
}
dlamblin
+1  A: 

If you aren't opposed to external dependencies, the HTML Purifier library is a pretty good filter for a majority of XSS attacks.

leek
A: 

In PHP, I'd start with strip_tags. Like so:

$output = strip_tags($input);

If I wanted to allow some tags in user input, I'd include them, like so:

$output = strip_tags($input, '<code><em><strong>');
Kent Brewster
A: 

I don't think it's possible to find javascript code like that.

You'd have to pass the data through an interpreter of some type to attempt to find valid js statements. This would be very processor intensive and probably generate many false positives depending on the nature of your text.

Entity escaping meta characters is probably the best way to further protect your application from attacks your filter may have missed. Javascript can't be run if it's loaded as regular text.

tduehr