views:

102

answers:

4

Ok, I'm reading through a file now, line by line. I know each functions name in the file, since it is defined elsewhere in an XML document. That is what should be this:

function function_name

Where function_name is the name of the function.

I get all of the function definitions from an XML document that I already have put into an array of function names, and I need to grab just those functions from the php file. And rebuild that php file so that it only has those functions in it. That is to say, if a php file has more functions than what is defined in the XML tag, than I need to strip out those functions, and rewrite the .php file with only the functions that the user specified in the XML file.

So, the dilemma I face is how to determine the END of a function reading line by line, and I'm aware that functions can have functions within them. So I don't want to remove the functions within them. Just functions that are standalone and aren't defined within the accompanying XML file. Any ideas on how to do this??

Ok, I'm using the following function now:

//!!! - Used to grab the contents of all functions within a file with the functions array.
function get_functions($source, $functions = array()) 
{
    global $txt;

    if (!file_exists($source) || !is_readable($source))
        return '';

    $tokens = token_get_all(file_get_contents($source));

    foreach($functions as $funcName)
    {
        for($i=0,$z=count($tokens); $i<$z; $i++)
        {
            if (is_array($tokens[$i]) && $tokens[$i][0] == T_FUNCTION && is_array($tokens[$i+1]) && $tokens[$i+1][0] == T_WHITESPACE && is_array($tokens[$i+2]) && $tokens[$i+2][1] == $funcName)
                break;

            $accumulator = array();
            // collect tokens from function head through opening brace
            while($tokens[$i] != '{' && ($i < $z)) { 
               $accumulator[] = is_array($tokens[$i]) ? $tokens[$i][1] : $tokens[$i];
               $i++;
            }
            if($i == $z) {
                // handle error
                fatal_error($txt['error_occurred'], false);
            } else {
               // note, accumulate, and position index past brace
               $braceDepth = 1; 
               $accumulator[] = '{';
               $i++;
            }
            while($braceDepth > 0 && ($i < $z)) {
               if(is_array($tokens[$i]))
                  $accumulator[] = $tokens[$i][1];
               else {
                  $accumulator[] = $tokens[i];
                  if($tokens[$i] == '{') $braceDepth++;
                  else if($tokens[i] == '}') $braceDepth--;
               }
               $i++;
            }
            $functionSrc = implode(null,$accumulator);
        }
    }

    return $functionSrc;
}

OK, so it takes this php files content:

<?php
function module_testing($params)
{
    // Is it installed?
    $test_param = !isset($params['test_param']) ? 'Testing Testing 1 2 3!' : $params['test_param'];

    // Grab the params, if they exist.
    if (is_array($params))
    {           
        echo $test_param;
    }
    // Throw an error.
    else
        module_error();
}

?>

and changes it like so:

<?php

function module_testing($params)

{

    // Is it installed?

    $test_param  isset$params'test_param'  'Testing Testing 1 2 3!'  $params'test_param'



    // Grab the params, if they exist.

    if is_array$params



        echo $test_param



    // Throw an error.

    else

        module_error





?>

As you can see it took a whole bunch of stuff outta here. And the last closing bracket is missing... All I need to do is check if the function exists in here function module_testing, and grab the entire function and write it to the same file. Seems simple enough, but WoW, this is some major coding for just this minor thing IMO...

Or I could also check if a function is defined in here that isn't within the $functions array, if so, than just remove that function. Perhaps it's easier with this approach instead??

Thanks :)

+2  A: 

You probably want to try the PHP Tokenizer.

http://www.php.net/manual/en/ref.tokenizer.php

From an external script:

<?php

var_dump(token_get_all(file_get_contents('myscript.php')));

?>
Sarfraz
OMG and how does that work? I see the output from it, but darn if it makes any sense to me...
SoLoGHoST
How do I use that approach to determine the start and end of a function?? I mean, look at the var_dump above...? the function name is "module_testing" and yes, I see it in there, but how do I use this, perhaps an example would be great.
SoLoGHoST
@SoLoGHoST: I have just given an example using `var_dump`, see the documentation of `token_get_all` for more info here: http://php.net/manual/en/function.token-get-all.php
Sarfraz
Sorry, but anyone can do var_dump, that's not an example. Well, thanks anyways...
SoLoGHoST
Bytheway, my answer below is the way to do it. Thanks anyways :)
SoLoGHoST
A: 

The PHP tokenizer Sarfraz mentioned is a good idea, particularly if you're going to be doing a lot of code rewriting beyond what you've mentioned here.

However, this case might be simple enough you wouldn't need it.

A php function, if it's well formed, should have:

1) A "head", which looks like function funcname($arg1,...,$argn). You can probably locate this and pull this out with a regex.

2) Following the head, a "body", which is going to consist of everything after the head that's included within a pair of matched curly braces. So, you have to figure out how to match them. One way to do this would be to specify a $curlyBraceDepth variable. Start it at 0, and then starting with the curly brace that opens the body of the function, walk through the code one character at a time. Every time you encounter an opening brace, increment $curlyBraceDepth. Every time you encounter a closing brace, decrement it. When $curlyBraceDepth < 1 (e.g., when you're back at depth 0), you'll have finished walking through the body of the function. While you're going through checking each character, you'll either want to be accumulating each character you're reading in an array, or if you've already got this all in a string in memory, marking the start and end position so you can pull it out later.

Now, there's a big caveat here: if any of your functions are handling unmatched curly braces as characters inside of strings -- not particularly common, but absolutely legal and possible php -- then you're also going to have to add conditional code to parse strings as separate tokens. While you could conceivably write your own code to handle this as well, if you're concerned about it as a corner case, Tokenizer is probably the robust way to go.

But, you'd be using something like the algorithm I gave above as you scan through the tokens, anyway -- find the tokens signifying the head, sort through the tokens comprising the body, counting T_CURLY_OPEN and T_CURLY_CLOSE to keep track of your brace depth, accumulating the tokens as you go and concatenating them when you reach zero brace depth.

UPDATE (using Tokenizer)

token_get_all takes care of lumping individual characters of source into syntactically significant PHP tokens. Here's a quick example. Let's say we have the following string of PHP source:

$s = '<?php function one() { return 1; }';

And we run it through token_get_all:

$tokens = token_get_all($s);

If you do a print_r on this, here's what you'll see (with some inlined comments):

Array
(
    [0] => Array
        (
            [0] => 367      // token number (also known by constant T_OPEN_TAG)
            [1] => <?php    // token literal as found in source
            [2] => 1        
        )

    [1] => Array
        (
            [0] => 333      // token number (also known by constant T_FUNCTION)
            [1] => function // token literal as found in source
            [2] => 1       
        )

    [2] => Array
        (
            [0] => 370      // token number (aka T_WHITESPACE)
            [1] =>          // you can't see it, but it's there. :)
            [2] => 1
        )

    [3] => Array
        (
            [0] => 307      // token number (aka T_STRING)
            [1] => one      // hey, it's the name of our function
            [2] => 1
        )

    [4] => (                // literal token - open paren
    [5] => )                // literal token - close paren
    [6] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 1
        )

    [7] => {
    [8] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 1
        )

    [9] => Array
        (
            [0] => 335
            [1] => return
            [2] => 1
        )

    [10] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 1
        )

    [11] => Array
        (
            [0] => 305
            [1] => 1
            [2] => 1
        )

    [12] => ;
    [13] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 1
        )

    [14] => }
    [15] => Array
        (
            [0] => 370
            [1] =>  
            [2] => 1
        )

    [16] => Array
        (
            [0] => 369
            [1] => ?>
            [2] => 1
        )

)

Notice that some of the entries in the array are character literals (parenthesis and braces, in fact, which makes this easier than I thought). Others are arrays, containing in a "token number" at the 0 index, and the token literal at the 1 index (no idea what that '1' value at the 2 index is). If you want the "token name" -- really, a PHP constant that evaluates to the token number -- you can utilize the token_name function. For example, that familiar first token, with the number 367, is referred to by the name and PHP constant T_OPEN_TAG.

If you wanted to use this to copy the source of function 'one' from file A to file B, you could do $tokens = token_get_all(file_get_contents('file_A')), and then search for the sequence of literal tokens that signifies the start of that function -- in our case, T_FUNCTION, T_WHITESPACE, and a T_STRING that's equal to 'one'. So:

for($i=0,$z=count($tokens); $i<$z; $i++)
   if( is_array($tokens[$i]) 
    && $tokens[$i][0] == T_FUNCTION
    && is_array($tokens[$i+1])
    && $tokens[$i+1][0] == T_WHITESPACE
    && is_array($tokens[$i+2])
    && $tokens[$i+2][1] == 'one')
      break;

At this point, you'd do what I described earlier: start at the opening curly brace for the body of the function at an indent level of 1, watch for curly brace tokens, keep track of depth, and accumulate tokens:

$accumulator = array();
// collect tokens from function head through opening brace
while($tokens[$i] != '{' && ($i < $z)) { 
   $accumulator[] = is_array($tokens[$i]) ? $tokens[$i][1] : $tokens[$i];
   $i++;
}
if($i == $z) {
    // handle error
} else {
   // note, accumulate, and position index past brace
   $braceDepth = 1; 
   $accumulator[] = '{';
   $i++;
}
while($braceDepth > 0 && ($i < $z)) {
   if(is_array($tokens[$i]))
      $accumulator[] = $tokens[$i][1];
   else {
      $accumulator[] = $tokens[i];
      if($tokens[$i] == '{') $braceDepth++;
      else if($tokens[i] == '}') $braceDepth--;
   }
}
$functionSrc = implode(null,$accumulator);
Weston C
Thanks, but I don't understand this tokenizer thing, it seems to complicated for what I want to do. Even after reading it over at the link that Sarfraz linked to.
SoLoGHoST
It's a little complicated, but really not that bad. I've added a section that tries to explain how it works more closely with some more example code.
Weston C
Wow, thanks a lot bro :)
SoLoGHoST
This is excellent!!! :) What a great Example!
SoLoGHoST
Hello again, I have tested this script, and I get the following error on this line `$accumulator[] = $tokens[$i][1];` saying this: `Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 4194304 bytes) in blah/blah/blah/blah.php on line 1382`, please note blah/blah are actual paths to my server. Any ideas why? And this is a fairly small file.
SoLoGHoST
It's probably because I forgot an `$i++` at the end of that loop. :) So it's adding that first token (probably T_WHITESPACE) over and over and over until it runs out of memory. Oops.
Weston C
Ok, that was the problem with the memory limit, Thank You, but your code has removed these characters as well from within the function `;`, `(`, `)`, `=`, `?`, `:`, it shouldn't remove those characters...
SoLoGHoST
Give me a sec, I will post up the function I am using, I really do appreciate this!! :)
SoLoGHoST
Ok, posted up the function and the original php file, and than the php file after applying the function to it. Hopefully you can spot the problem, and I really appreciate your help. I mean I'd be completely lost without you. Cheers :)
SoLoGHoST
A: 

A function will - as far as I know - always be included in those brackets: {}. So your job is to scan the phpfile for the start of the function - you said that's not the problem - and then you have to scan so far that all opened {got closed.

But what if there is a function or if-clause or anything else in your function which is also using those brackets? To manage that you have to impement a $counter, which counts up for each { and down for each }. If counter = zero the function's end is reached.

Example: Your function:

//lots of functions
function f_unimportant($args) { //Scan the first "{" after your f_unimportant
                                //and set $counter=1;
if($args > '') {                //increase $counter by 1
   //Do stuff
}                               //decrease $counter by 1

echo $result;

}                               //decrease $counter by 1
                                //now $counter is zero and end of function is reached

The counter tells you the depth of your code. If depth=0 function has ended.

Analysis: You have an $array of chars, where your phpfile is stored, beginning after function f_unimportant($args) {.

$counter = 1;
$length = 0; //length of your function (to be able to delete it)
foreach($array as $char) {
   $length ++;
   if($char == '{') {
      $counter ++;
   }
   else if($char == '}') {
      $counter --;
   }

   if($counter == 0) {break;} //leave foreach because end of function is reached
}
//now you just delete $length chars from your phpfile starting at the position
//you already found out, where your function starts.

and do not forget to delete function f_unimportant($args) { aswell (it is not counted in $length!)

Now this is a pretty interesting concept :)
SoLoGHoST
I hope that it will work :-) I haven't tried it by myself. Most important thing will be that there are no strings inside the function using { or }, like "abc foo { bar", because then it gets all a bit mor complicating...
Yeah, I see what you mean. hmmmm, perhaps the tokenizer thingy is the only guaranteed way... Thanks a lot :)
SoLoGHoST
A: 

Ok, guys, I managed to fix this perfectly fine, and on my own, and here is the perfect solution. I want to thank you all for your help with this. Thanks, you guys have gone far beyond helping me here. But I knew this would be a simple solution without using the tokenizer functions. Perhaps you guys have forgotten that I have the name of each function? In any case, thanks again, but the token functions won't be needed for this.

Cheers.

function remove_undefined_functions($source, $functions = array())
{
    if (!file_exists($source) || !is_readable($source))
        return '';

    $code = '';
    $removeStart = false;

    $fp = fopen($source, 'rb');
    while (!feof($fp))
    {
        $output = fgets($fp);
        $funcStart = strpos(strtolower($output), 'function');

        if ($funcStart !== false)
        {
            foreach($functions as $funcName)
            {
                if (strpos($output, $funcName) !== false)
                {
                    $code .= $output;
                    $removeStart = false;
                    break;
                }
                else
                    $removeStart = true;
            }
            continue;
        }
        else
        {
            if (substr($output, 0, 2) == '?>' || !$removeStart)
                $code .= $output;
        }
    }
    fclose($fp);

    // Rewrite the file with the functions that are defined.
    $fo = @fopen($source, 'wb');

    // Get rid of the extra lines...
    @fwrite($fo, str_replace("\r\n", "\n", $code));

    fclose($fo);
}

And this will make it so that if there is a function inside of a function, than the user will have to define it, otherwise, the function will not work properly. So this isn't really a big deal to me, since they can have an unlimited amount of functions, and would better suited that each function is a function to itself.

SoLoGHoST