views:

62

answers:

3

I have a string that has php code in it, I need to remove the php code from the string, for example:

<?php $db1 = new ps_DB() ?><p>Dummy</p>

Should return <p>Dummy</p>

And a string with no php for example <p>Dummy</p> should return the same string.

I know this can be done with a regular expression, but after 4h I haven't found a solution.

A: 

If you are using PHP, you just need to use a regular expression to replace anything that matches PHP code.

The following statement will remove the PHP tag:

preg_replace('/^<\?php.*\?\>/', '', '<?php $db1 = new ps_DB() ?><p>Dummy</p>');

If it doesn't find any match, it won't replace anything.

jeph perro
+2  A: 
 <?php
 function filter_html_tokens($a){
    return is_array($a) && $a[0] == T_INLINE_HTML ?
      $a[1]:
      '';
 }
 $htmlphpstring = '<a>foo</a> something <?php $db1 = new ps_DB() ?><p>Dummy</p>';
 echo implode('',array_map('filter_html_tokens',token_get_all($htmlphpstring)));
 ?>

As ircmaxell pointed out: this would require valid PHP!

A regex route would be (allowing for no 'php' with short tags. no ending ?> in the string / file (for some reason Zend recommends this?) and ofcourse an UNgreedy pattern):

preg_replace('/<\?.*?(\?>|$)/', '',$htmlphpstring);
Wrikken
Just note that you may not get valid HTML out of the regex solution... `<?php $foo='?>'; $bar = 'something';?><b>foo</b>` will yield `'; $bar='something'; ?><b>foo</b>`. The sort of it, is there's no perfect solution... Combine each to get a "best"...
ircmaxell
Indeed, no perfect solution. If the actual problem can be solved higher up so our though up kludges don't have to be used it would be far preferable.
Wrikken
A: 

Well, you can use DomDocument to do it...

function stripPHPFromHTML($html) {
    $dom = new DomDocument();
    $dom->loadHtml($html);
    removeProcessingInstructions($dom);
    $simple = simplexml_import_dom($d->getElementsByTagName('body')->item(0));
    return $simple->children()->asXml();
}

function removeProcessingInstructions(DomNode &$node) {
    foreach ($node->childNodes as $child) {
        if ($child instanceof DOMProcessingInstruction) {
            $node->removeChild($child);
        } else {
            removeProcessingInstructions($child);
        }
    }
}

Those two functions will turn

$str = '<?php echo "foo"; ?><b>Bar</b>';
$clean = stripPHPFromHTML($str);
$html = '<b>Bar</b>';

Edit: Actually, after looking at Wrikken's answer, I realized that both methods have a disadvantage... Mine requires somewhat valid HTML markup (Dom is decent, but it won't parse <b>foo</b><?php echo $bar). Wrikken's requires valid PHP (any syntax errors and it'll fail). So perhaps a combination of the two (try one first. If it fails, try the other. If both fail, there's really not much you can do without trying to figure out the exact reason they failed)...

ircmaxell
Good point, with invalid PHP mine would indeed fail. Added it to the answer for good measure.
Wrikken