views:

723

answers:

4

hey guys i am trying to remove javascript from the html but i am not getting the regex to work with php its giving me an null array here is my code

<?php
$var = '
<script type="text/javascript"> 
function selectCode(a) 
{ 
   var e = a.parentNode.parentNode.getElementsByTagName(PRE)[0]; 
   if (window.getSelection) 
   { 
      var s = window.getSelection(); 
       if (s.setBaseAndExtent) 
      { 
         s.setBaseAndExtent(e, 0, e, e.innerText.length - 1); 
      } 
      else 
      { 
         var r = document.createRange(); 
         r.selectNodeContents(e); 
         s.removeAllRanges(); 
         s.addRange(r); 
      } 
   } 
   else if (document.getSelection) 
   { 
      var s = document.getSelection(); 
      var r = document.createRange(); 
      r.selectNodeContents(e); 
      s.removeAllRanges(); 
      s.addRange(r); 
   } 
   else if (document.selection) 
   { 
      var r = document.body.createTextRange(); 
      r.moveToElementText(e); 
      r.select(); 
   } 
} 
</script>
';

   function remove_javascript($java){
   echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/i', "", $java);

   }    
?>
+3  A: 

This might do more than you want, but depending on your situation you might want to look at strip_tags.

deceze
+6  A: 

this should do it:

echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var);

/s is so that the dot . matches newlines too.

Just a warning, you should not use this type of regexp to sanitize user input for a website. There is just too many ways to get around it. For sanitizing use something like the http://htmlpurifier.org/ library

Tjofras
THANKS A MILLION :)
Saxtor
I think this does not cover the case mentioned before, <scr/* */ipt> which is exactly what someone who tried to bypass such a check would do.
dimitris mistriotis
Will a browser really run something inside `<scr/* */ipt>`? I find that hard to believe...
gnud
I have changed/improved it slightly (especially to match any optional whitespace in the tag, which browsers would ignore, too): `$html = preg_replace('~<\s*\bscript\b[^>]*>(.*?)<\s*\/\s*script\s*>~is', '', $html);`
blueyed
A: 

In your case you could regard the string as a list of newline delimited strings and remove the lines containing the script tags(first & second to last) and you wouldn't even need regular expressions.

Though if what you are trying to do is preventing XSS it might not be sufficient to only remove script tags.

tosh
well thanks for the advice however what i am doing is creating an ripper so that was needed in my class code thank you guys!
Saxtor
+1  A: 

Here's an idea

while (true) {

if ($beginning = strpos($var,"<script")) {

$stringLength = (strpos($var,"</script>") + strlen("</script>")) - $beginning;

substr_replace($var, "", $beginning, $stringLength);

} else {

break

}

Carlson Technology