views:

770

answers:

2

I'm trying to iterate over a directory which contains loads of PHP files, and detect what classes are defined in each file.

Consider the following:

$php_files_and_content = new PhpFileAndContentIterator($dir);
foreach($php_files_and_content as $filepath => $sourceCode) {
    // echo $filepath, $sourceCode
}

The above $php_files_and_content variable represents an iterator where the key is the filepath, and the content is the source code of the file (as if that wasn't obvious from the example).

This is then supplied into another iterator which will match all the defined classes in the source code, ala:

class DefinedClassDetector extends FilterIterator implements RecursiveIterator {
    public function accept() {
        return $this->hasChildren();
    }

    public function hasChildren() {
        $classes = getDefinedClasses($this->current());
        return !empty($classes);
    }

    public function getChildren() {
        return new RecursiveArrayIterator(getDefinedClasses($this->current()));
    }
}

$defined_classes = new RecursiveIteratorIterator(new DefinedClassDetector($php_files_and_content));

foreach($defined_classes as $index => $class) {
    // print "$index => $class"; outputs:
    // 0 => Class A
    // 1 => Class B
    // 0 => Class C
}

The reason the $index isn't sequential numerically is because 'Class C' was defined in the second source code file, and thus the array returned starts from index 0 again. This is preserved in the RecursiveIteratorIterator because each set of results represents a separate Iterator (and thus key/value pairs).

Anyway, what I am trying to do now is find the best way to combine these, such that when I iterate over the new iterator, I can get the key is the class name (from the $defined_classes iterator) and the value is the original file path, ala:

foreach($classes_and_paths as $filepath => $class) {
    // print "$class => $filepath"; outputs
    // Class A => file1.php
    // Class B => file1.php
    // Class C => file2.php
}

And that's where I'm stuck thus far.

At the moment, the only solution that is coming to mind is to create a new RecursiveIterator, that overrides the current() method to return the outer iterator key() (which would be the original filepath), and key() method to return the current iterator() value. But I'm not favouring this solution because:

  • It sounds complex (which means the code will look hideous and it won't be intuitive
  • The business rules are hard-coded inside the class, whereas I would like to define some generic Iterators and be able to combine them in such a way to produce the required result.

Any ideas or suggestions gratefully recieved.

I also realise there are far faster, more efficient ways of doing this, but this is also an exercise in using Iterators for myselfm and also an exercise in promoting code reuse, so any new Iterators that have to be written should be as minimal as possible and try to leverage existing functionality.

Thanks

A: 

I think what you want to do, is more or less to reverse the keys and values returned from PhpFileAndContent. Said class returns a list of filepath => source, and you want to first reverse the mapping so it is source => filepath and then expand source for each class defined in source, so it will be class1 => filepath, class2 => filepath.

It should be easy as in your getChildren() you can simply access $this->key() to get the current file path for the source you are running getDefinedClasses() on. You can write getDefinedClasses as getDefinedClasses($path, $source) and instead of returning an indexed array of all the classes, it will return a dictionary where each value from the current indexed array is a key in the dictionary and the value is the filepath where that class was defined.

Then it will come out just as you want.

The other option is to drop your use of RecursiveArrayIterator and instead write your own iterator that is initialized (in getChildren) as

return new FilePathMapperIterator($this->key,getDefinedClasses($this->current()));

and then FilePathMapperIterator will convert the class array from getDefinedClasses to the class => filepath mapping I described by simply iterating over the array and returning the current class in key() and always returning the specified filepath in current().

I think the latter is more cool, but definitely more code so its unlikely that I would have gone that way if I can adapt getDefinedClasses() for my needs.

Guss
+2  A: 

OK, I think I finally got my head around this. Here's roughly what I did in pseudo-code:

Step 1 We need to list the directory contents, thus we can perform the following:

// Reads through the $dir directory
// traversing children, and returns all contents
$dirIterator = new RecursiveDirectoryIterator($dir);

// Flattens the recursive iterator into a single
// dimension, so it doesn't need recursive loops
$dirContents = new RecursiveIteratorIterator($dirIterator);

Step 2 We need to consider only the PHP files

class PhpFileIteratorFilter {
    public function accept() {
        $current = $this->current();
        return    $current instanceof SplFileInfo
               && $current->isFile()
               && end(explode('.', $current->getBasename())) == 'php';
    }
}


// Extends FilterIterator, and accepts only .php files
$php_files = new PhpFileIteratorFilter($dirContents);

The PhpFileIteratorFilter isn't a great use of re-usable code. A better method would have been to be able to supply a file extension as part of the construction and get the filter to match on that. Although that said, I am trying to move away from construction arguments where they are not required and rely more on composition, because that makes better use of the Strategy pattern. The PhpFileIteratorFilter could simply have used the generic FileExtensionIteratorFilter and set itself up interally.

Step 3 We must now read in the file contents

class SplFileInfoReader extends FilterIterator {

    public function accept() {
        // make sure we use parent, this one returns the contents
        $current = parent::current();
        return    $current instanceof SplFileInfo
               && $current->isFile()
               && $current->isReadable();
    }

    public function key() {
        return parent::current()->getRealpath();
    }

    public function current() {
        return file_get_contents($this->key());
    }    
}

// Reads the file contents of the .php files
// the key is the file path, the value is the file contents
$files_and_content = new SplFileInfoReader($php_files);

Step 4 Now we want to apply our callback to each item (the file contents) and somehow retain the results. Again, trying to make use of the strategy pattern, I've done away unneccessary contructor arguments, e.g. $preserveKeys or similar

/**
 * Applies $callback to each element, and only accepts values that have children
 */
class ArrayCallbackFilterIterator extends FilterIterator implements RecursiveIterator {

    public function __construct(Iterator $it, $callback) {
        if (!is_callable($callback)) {
            throw new InvalidArgumentException('$callback is not callable');
        }

        $this->callback = $callback;
        parent::__construct($it);
    }

    public function accept() {
        return $this->hasChildren();
    }

    public function hasChildren() {
        $this->results = call_user_func($this->callback, $this->current());
        return is_array($this->results) && !empty($this->results);
    }

    public function getChildren() {
        return new RecursiveArrayIterator($this->results);
    }
}


/**
 * Overrides ArrayCallbackFilterIterator to allow a fixed $key to be returned
 */
class FixedKeyArrayCallbackFilterIterator extends ArrayCallbackFilterIterator {
    public function getChildren() {
        return new RecursiveFixedKeyArrayIterator($this->key(), $this->results);
    }
}


/**
 * Extends RecursiveArrayIterator to allow a fixed $key to be set
 */
class RecursiveFixedKeyArrayIterator extends RecursiveArrayIterator {

    public function __construct($key, $array) {
        $this->key = $key;
        parent::__construct($array);
    }

    public function key() {
        return $this->key;
    }
}

So, here I have my basic iterator which will return the results of the $callback I supplied through, but I've also extended it to create a version that will preserve the keys too, rather than using a constructor argument for it.

And thus we have this:

// Returns a RecursiveIterator
// key: file path
// value: class name
$class_filter = new FixedKeyArrayCallbackFilterIterator($files_and_content, 'getDefinedClasses');

Step 5 Now we need to format it into a suitable manner. I desire the file paths to be the value, and the keys to be the class name (i.e. to provide a direct mapping for a class to the file in which it can be found for the auto loader)

// Reduce the multi-dimensional iterator into a single dimension
$files_and_classes = new RecursiveIteratorIterator($class_filter);

// Flip it around, so the class names are keys
$classes_and_files = new FlipIterator($files_and_classes);

And voila, I can now iterate over $classes_and_files and get a list of all defined classes under $dir, along with the file they're defined in. And pretty much all of the code used to do this is re-usable in other contexts as well. I haven't hard-coded anything in the defined Iterator to achieve this task, nor have I done any extra processing outside the iterators

JamShady