views:

4371

answers:

10

Do you know any good tools for nicely formatting messy php code? Preferably a script for Aptana/Eclipse, but a standalone tool will do too.

+3  A: 

If you use Zend Development Environment, you can use the Indent Code feature (Ctrl+Shift+F).

Ian P
Also included in Eclipse
Bob Fanger
+2  A: 

The Zend Development Environment is now an Eclipse plugin, you may be able to run it alongside Aptana and just use it's Indent Code feature.

Zend Studio

I haven't upgraded to the Eclipse plugin yet myself, I love the previous ZDE so much. Though now that I've started actually using Eclipse for other languages, I'm almost ready to make the leap.

Adam
+2  A: 

Here's a php code beautifier (PHP of course) class:
http://www.codeassembly.com/A-php-code-beautifier-that-works/

and

online demo:

http://www.codeassembly.com/examples/beautifier.php

micahwittman
+5  A: 

PHP Code Beautifier is a useful free tool that should do what you're after, although their download page does require an account to be created.

The tool has been declined into 3 versions:

  • A GUI version which allow to process file visually.
  • A command line version which allow to be batched or integrated with other tools (CVS, SubVersion, IDE ...).
  • As an integrated tool of PHPEdit.

Basically, it'll turn:

if($code == BAD){$action = REWRITE;}else{$action = KEEP;}
for($i=0; $i<10;$i++){while($j>0){$j++;doCall($i+$j);if($k){$k/=10;}}}

into

if ($code == BAD) {
    $action = REWRITE;
} else {
    $action = KEEP;
}
for($i = 0; $i < 10;$i++) {
    while ($j > 0) {
        $j++;
        doCall($i + $j);
        if ($k) {
            $k /= 10;
        }
    }
}
ConroyP
FWIW, this appears to be a closed-source app and is not available for Mac OS X.
nohat
+8  A: 

Well here is my very basic and rough script:

#!/usr/bin/php
<?php
class Token {
    public $type;
    public $contents;

    public function __construct($rawToken) {
        if (is_array($rawToken)) {
            $this->type = $rawToken[0];
            $this->contents = $rawToken[1];
        } else {
            $this->type = -1;
            $this->contents = $rawToken;
        }
    }
}

$file = $argv[1];
$code = file_get_contents($file);

$rawTokens = token_get_all($code);
$tokens = array();
foreach ($rawTokens as $rawToken) {
    $tokens[] = new Token($rawToken);
}

function skipWhitespace(&$tokens, &$i) {
    global $lineNo;
    $i++;
    $token = $tokens[$i];
    while ($token->type == T_WHITESPACE) {
        $lineNo += substr($token->contents, "\n");
        $i++;
        $token = $tokens[$i];
    }
}

function nextToken(&$j) {
    global $tokens, $i;
    $j = $i;
    do {
        $j++;
        $token = $tokens[$j];
    } while ($token->type == T_WHITESPACE);
    return $token;
}

$OPERATORS = array('=', '.', '+', '-', '*', '/', '%', '||', '&&', '+=', '-=', '*=', '/=', '.=', '%=', '==', '!=', '<=', '>=', '<', '>', '===', '!==');

$IMPORT_STATEMENTS = array(T_REQUIRE, T_REQUIRE_ONCE, T_INCLUDE, T_INCLUDE_ONCE);

$CONTROL_STRUCTURES = array(T_IF, T_ELSEIF, T_FOREACH, T_FOR, T_WHILE, T_SWITCH, T_ELSE);
$WHITESPACE_BEFORE = array('?', '{', '=>');
$WHITESPACE_AFTER = array(',', '?', '=>');

foreach ($OPERATORS as $op) {
    $WHITESPACE_BEFORE[] = $op;
    $WHITESPACE_AFTER[] = $op;
}

$matchingTernary = false;

// First pass - filter out unwanted tokens
$filteredTokens = array();
for ($i = 0, $n = count($tokens); $i < $n; $i++) {
    $token = $tokens[$i];
    if ($token->contents == '?') {
        $matchingTernary = true;
    }
    if (in_array($token->type, $IMPORT_STATEMENTS) && nextToken($j)->contents == '(') {
        $filteredTokens[] = $token;
        if ($tokens[$i + 1]->type != T_WHITESPACE) {
            $filteredTokens[] = new Token(array(T_WHITESPACE, ' '));
        }
        $i = $j;
        do {
            $i++;
            $token = $tokens[$i];
            if ($token->contents != ')') {
                $filteredTokens[] = $token;
            }
        } while ($token->contents != ')');
    } elseif ($token->type == T_ELSE && nextToken($j)->type == T_IF) {
        $i = $j;
        $filteredTokens[] = new Token(array(T_ELSEIF, 'elseif'));
    } elseif ($token->contents == ':') {
        if ($matchingTernary) {
            $matchingTernary = false;
        } elseif ($tokens[$i - 1]->type == T_WHITESPACE) {
            array_pop($filteredTokens); // Remove whitespace before
        }
        $filteredTokens[] = $token;
    } else {
        $filteredTokens[] = $token;
    }
}
$tokens = $filteredTokens;

function isAssocArrayVariable($offset = 0) {
    global $tokens, $i;
    $j = $i + $offset;
    return $tokens[$j]->type == T_VARIABLE &&
        $tokens[$j + 1]->contents == '[' &&
        $tokens[$j + 2]->type == T_STRING &&
        preg_match('/[a-z_]+/', $tokens[$j + 2]->contents) &&
        $tokens[$j + 3]->contents == ']';
}

// Second pass - add whitespace
$matchingTernary = false;
$doubleQuote = false;
for ($i = 0, $n = count($tokens); $i < $n; $i++) {
    $token = $tokens[$i];
    if ($token->contents == '?') {
        $matchingTernary = true;
    }
    if ($token->contents == '"' && isAssocArrayVariable(1) && $tokens[$i + 5]->contents == '"') {
        /*
         * Handle case where the only thing quoted is the assoc array variable.
         * Eg. "$value[key]"
         */
        $quote = $tokens[$i++]->contents;
        $var = $tokens[$i++]->contents;
        $openSquareBracket = $tokens[$i++]->contents;
        $str = $tokens[$i++]->contents;
        $closeSquareBracket = $tokens[$i++]->contents;
        $quote = $tokens[$i]->contents;        
        echo $var . "['" . $str . "']";
        $doubleQuote = false;
        continue;
    }
    if ($token->contents == '"') {
        $doubleQuote = !$doubleQuote;
    }
    if ($doubleQuote && $token->contents == '"' && isAssocArrayVariable(1)) {
        // don't echo "
    } elseif ($doubleQuote && isAssocArrayVariable()) {
        if ($tokens[$i - 1]->contents != '"') {
            echo '" . ';
        }
        $var = $token->contents;
        $openSquareBracket = $tokens[++$i]->contents;
        $str = $tokens[++$i]->contents;
        $closeSquareBracket = $tokens[++$i]->contents;
        echo $var . "['" . $str . "']";
        if ($tokens[$i + 1]->contents != '"') {
            echo ' . "';
        } else {
            $i++; // process "
            $doubleQuote = false;
        }
    } elseif ($token->type == T_STRING && $tokens[$i - 1]->contents == '[' && $tokens[$i + 1]->contents == ']') {
        if (preg_match('/[a-z_]+/', $token->contents)) {
            echo "'" . $token->contents . "'";
        } else {
            echo $token->contents;
        }
    } elseif ($token->type == T_ENCAPSED_AND_WHITESPACE || $token->type == T_STRING) {
        echo $token->contents;
    } elseif ($token->contents == '-' && in_array($tokens[$i + 1]->type, array(T_LNUMBER, T_DNUMBER))) {
        echo '-';
    } elseif (in_array($token->type, $CONTROL_STRUCTURES)) {
        echo $token->contents;
        if ($tokens[$i + 1]->type != T_WHITESPACE) {
            echo ' ';
        }
    } elseif ($token->contents == '}' && in_array($tokens[$i + 1]->type, $CONTROL_STRUCTURES)) {
        echo '} ';
    } elseif ($token->contents == '=' && $tokens[$i + 1]->contents == '&') {
        if ($tokens[$i - 1]->type != T_WHITESPACE) {
            echo ' ';
        }
        $i++; // match &
        echo '=&';
        if ($tokens[$i + 1]->type != T_WHITESPACE) {
            echo ' ';          
        }
    } elseif ($token->contents == ':' && $matchingTernary) {
        $matchingTernary = false;
        if ($tokens[$i - 1]->type != T_WHITESPACE) {
            echo ' ';
        }
        echo ':';
        if ($tokens[$i + 1]->type != T_WHITESPACE) {
            echo ' ';
        }
    } elseif (in_array($token->contents, $WHITESPACE_BEFORE) && $tokens[$i - 1]->type != T_WHITESPACE &&
        in_array($token->contents, $WHITESPACE_AFTER) && $tokens[$i + 1]->type != T_WHITESPACE) {
        echo ' ' . $token->contents . ' ';
    } elseif (in_array($token->contents, $WHITESPACE_BEFORE) && $tokens[$i - 1]->type != T_WHITESPACE) {
        echo ' ' . $token->contents;
    } elseif (in_array($token->contents, $WHITESPACE_AFTER) && $tokens[$i + 1]->type != T_WHITESPACE) {
        echo $token->contents . ' ';
    } else {
        echo $token->contents;
    }
}
grom
Why use an existing, tested tool when you can write your own? +1 for writing it in php.
postfuturist
WHAT? Did you actually wrote it?
M28
Yes I wrote my own cause I couldn't find any existing one that I was happy with.
grom
Can someone please turn this into an Eclipse plugin?
Randell
+2  A: 

Check out phpDesigner, it has a beautifier tool that works pretty well.

barfoon
+1  A: 

What about this one:

http://universalindent.sourceforge.net/

It combines a bunch of formatters out there, and will generate the scripts you need so you can pass them out and get your team members to use them before committing next time... Though... formatters might mess up your code and render it unusable...

SeanJA
If the code is unusable, what's the point of using this formatter?
Viet
Any formatter that runs automatically will have this risk, so it is worth noting
SeanJA
Formatters don't have to break your code. They have to parse it accurately. Those that "parse" using regexes usually fail to do so, thus... breakage.
Ira Baxter
It could also run out of memory or run into an exception or you could write some code that the parser does not understand (features that have been added to a more recent version of the language but are not yet supported by your parser).
SeanJA
A formatter isn't likely to run out of memory; a gigabyte of RAM is a lot even compare to 100K lines of PHP script. If you can write code that the parser doesn't understand, then the parser is stupidly implemented; good ones are not, and good ones are up-to-date with respect to the PHP langauge facilities.
Ira Baxter
Ok... but if *you* do not keep the formatter up to date... then you will screw up your code.
SeanJA
Also, since this one includes the formatter that was in the accepted solution, I don't see why you are debating the fact that it can screw up your code.
SeanJA
+1  A: 

The SD PHP Formatter will reliably format your code. It uses a compiler-based front end to parse the code, so it doesn't misinterpret the code and damage it. Consequently its formatted output always works.

Ira Baxter
$50 is a bit much for something I can't even try before I buy.
deadprogrammer
You can try it. Go to the download page.
Ira Baxter
A: 

There's a pear module that formats your code. PHP Beautifier

projecktzero
+1  A: 

http://en.sourceforge.jp/projects/pdt-tools/

^^^ will give you a proper CTRL+SHIFT+F Eclipse/Aptana PHP formatter like Java.

See here for installation help.

eclipse php code formatter

Chris