views:

1526

answers:

3

I want to allow users to paste <embed> and <object> HTML fragments (video players) via an HTML form. The server-side code is PHP. How can I protect against malicious pasted code, JavaScript, etc? I could parse the pasted code, but I'm not sure I could account for all variations. Is there a better way?

+3  A: 

I'm not really sure what parameters EMBED and OBJECT take as I've never really dealt with putting media on a page (which is actually kind of shocking to think about) but I would take a BB Code approach to it and do something like [embed url="http://www.whatever.com/myvideo.whatever" ...] and then you can parse out the URL and anything else, make sure they are legit and make your own <EMBED> tag.

edit: Alright, something like this should be fine:

$youtube = '<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1"&gt;&lt;/param&gt; </param><embed src="http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" width="425" height="344"></embed></object>';

$blip = '<embed src="http://blip.tv/play/AZ_iEoaIfA" type="application/x-shockwave-flash" width="640" height="510" allowscriptaccess="always" allowfullscreen="true"></embed>';

preg_match_all("/([A-Za-z]*)\=\"(.+?)\"/", $youtube, $matches1);
preg_match_all("/([A-Za-z]*)\=\"(.+?)\"/", $blip, $matches2);
print '<pre>' . print_r($matches1, true). '</pre>';
print '<pre>' . print_r($matches2, true). '</pre>';

This will output:

Array
(
[0] => Array
    (
        [0] => width="425"
        [1] => height="344"
        [2] => name="movie"
        [3] => value="http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1"
        [4] => src="http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1"
        [5] => type="application/x-shockwave-flash"
        [6] => allowfullscreen="true"
        [7] => width="425"
        [8] => height="344"
    )

[1] => Array
    (
        [0] => width
        [1] => height
        [2] => name
        [3] => value
        [4] => src
        [5] => type
        [6] => allowfullscreen
        [7] => width
        [8] => height
    )

[2] => Array
    (
        [0] => 425
        [1] => 344
        [2] => movie
        [3] => http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1
        [4] => http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1
        [5] => application/x-shockwave-flash
        [6] => true
        [7] => 425
        [8] => 344
    )
)

Array
(
[0] => Array
    (
        [0] => src="http://blip.tv/play/AZ_iEoaIfA"
        [1] => type="application/x-shockwave-flash"
        [2] => width="640"
        [3] => height="510"
        [4] => allowscriptaccess="always"
        [5] => allowfullscreen="true"
    )

[1] => Array
    (
        [0] => src
        [1] => type
        [2] => width
        [3] => height
        [4] => allowscriptaccess
        [5] => allowfullscreen
    )

[2] => Array
    (
        [0] => http://blip.tv/play/AZ_iEoaIfA
        [1] => application/x-shockwave-flash
        [2] => 640
        [3] => 510
        [4] => always
        [5] => true
    )
)

From then on it's pretty straight forward. For things like width/height you can verify them with is_numeric and with the rest you can run the values through htmlentities and construct your own <embed> tag from the information. I am pretty certain this would be safe. You can even make the full-fledged <object> one like YouTube (which I assume works in more places) with links from blip.tv, since you would have all the required data.

I am sure you may see some quirks with links from other video-sharing websites but this will hopefully get you started. Good luck.

Paolo Bergantino
You can take a look at the <embed> code from any YouTube or blip.tv to see what's there. It's a lot more than just a URL.
Doug Kaye
At first I didn't like this solution because it simply pattern matched rather than actually parsing. But it turns out very few attributes are actually required for Flash. So I used this as the basis for my final solution. Thanks, Paolo.
Doug Kaye
A: 

Here's an example of pasted code from blip.tv:

<embed src="http://blip.tv/play/AZ_iEoaIfA" type="application/x-shockwave-flash"    
  width="640" height="510" allowscriptaccess="always" allowfullscreen="true"></embed>

Here's an example of what you might get from YouTube:

<object width="425" height="344">
  <param name="movie" value="http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1"&gt;&lt;/param&gt;
  <param name="allowFullScreen" value="true"></param>
    <embed src="http://www.youtube.com/v/Z75QSExE0jU&amp;hl=en&amp;fs=1"
      type="application/x-shockwave-flash" allowfullscreen="true"
      width="425" height="344"></embed>
</object>
Doug Kaye
+1  A: 

Your chances of detecting malicious code reliably by scanning inputted HTML are about nil. There are so many possible ways to inject script (including browser-specific malformed HTML), you won't be able to pick them all out. If big webmail providers are still after years finding new exploits there is no chance you'll be able to do it.

Whitelisting is better than blacklisting. So you could instead require the input to be XHTML, and parse it using a standard XML parser. Then walk through the DOM and check that each of the elements and attributes is known-good, and if everything's OK, serialise back to XHTML, which, coming from a known-good DOM, should not be malformed. A proper XML parser with Unicode support should also filter out nasty 'overlong UTF-8 sequences' (a security hole affecting IE6 and older Operas) for free.

However... if you allow embed/objects from any domain, you are already allowing full script access to your page from an external domains, so HTML injection is the least of your worries. Plug-ins such as Flash are likely to be able to execute JavaScript without any kind of trickery being necessary.

So you should be limiting the source of objects to predetermined known-good domains. And if you're already doing that, it's probably easier to just allow the user to choose a video provider and clip ID, and then convert that into the proper, known-good embedding code for that provider. For example if you are using a bbcode-like markup, the traditional way to let users include a YouTube clip would be something [youtube]Dtzs7DSh[/youtube].

bobince