views:

4027

answers:

5

Whats the best way to remove comments from a PHP file?

I want to do something similar to strip-whitespace() - but it shouldn't remove the line breaks as well.

EG:

I want this:

<?PHP
// something
if ($whatsit) {
    do_something(); # we do something here
    echo '<html>Some embedded HTML</html>';
}
/* another long 
comment
*/
some_more_code();
?>

to become:

<?PHP
if ($whatsit) {
    do_something();
    echo '<html>Some embedded HTML</html>';
}
some_more_code();
?>

(Although if the empty lines remain where comments are removed, that wouldn't be ok).

It may not be possible, because of the requirement to preserve embedded html - thats whats tripped up the things that have come up on google.

A: 

The catch is that a less robust matching algorithm (simple regex, for instance) will start stripping here when it clearly shouldn't:

if (preg_match('#^/*' . $this->index . '#', $this->permalink_structure)) {

It might not affect your code, but eventually someone will get bit by your script. So you will have to use a utility that understands more of the language than you might otherwise expect.

Adam Davis
+14  A: 

I'd use tokenizer. Here's my solution. It should work on both PHP 4 and 5:

$fileStr = file_get_contents('path/to/file');
$newStr  = '';

$commentTokens = array(T_COMMENT);

if (defined('T_DOC_COMMENT'))
    $commentTokens[] = T_DOC_COMMENT; // PHP 5
if (defined('T_ML_COMMENT'))
    $commentTokens[] = T_ML_COMMENT;  // PHP 4

$tokens = token_get_all($fileStr);

foreach ($tokens as $token) {    
    if (is_array($token)) {
        if (in_array($token[0], $commentTokens))
            continue;

        $token = $token[1];
    }

    $newStr .= $token;
}

echo $newStr;
Ionuț G. Stan
ta for the code! will try it tommorrow.
benlumley
this sorted it out, ta
benlumley
Glad I could help.
Ionuț G. Stan
You should take out `$commentTokens` initialization out of the `foreach` block, otherwise +1 and thanks :)
Raveren
@Raveren, you're damn right. I have no idea what was in my mind back then to put that piece of code inside the loop. Thanks for pointing it out.
Ionuț G. Stan
+9  A: 

How about using php -w to generate a file stripped of comments and whitespace, then using a beautifier like PHP_Beautifier to reformat for readability?

Paul Dixon
thats a good option as well .....
benlumley
+1 - this is probably the best option.
Adam Davis
thanks for the suggestion - the other way was quicker to use, as all the bits were already on the server.
benlumley
Yes, I like the tokeniser answer, simpler!
Paul Dixon
+2  A: 
$fileStr = file_get_contents('file.php');
foreach (token_get_all($fileStr) as $token ) {
    if ($token[0] != T_COMMENT) {
        continue;
    }
    $fileStr = str_replace($token[1], '', $fileStr);
}

echo $fileStr;

edit I realised Ionut G. Stan has already suggested this, but I will leave the example here

Tom Haigh
I think the above snippet should work just fine. It's actually simpler than I thought.
Ionuț G. Stan
A: 

This works: http://devpro.it/remove_phpcomments/

countach