views:

614

answers:

8

I've been trying to replicate Gnu Find ("find .") in PHP, but it seems impossible to get even close to its speed. The PHP implementations use at least twice the time of Find. Are there faster ways of doing this with PHP?

EDIT: I added a code example using the SPL implementation -- its performance is equal to the iterative approach

EDIT2: When calling find from PHP it was actually slower than the native PHP implementation. I guess I should be satisfied with what I've got :)

// measured to 317% of gnu find's speed when run directly from a shell
function list_recursive($dir) { 
  if ($dh = opendir($dir)) {
    while (false !== ($entry = readdir($dh))) {
      if ($entry == '.' || $entry == '..') continue;

      $path = "$dir/$entry";
      echo "$path\n";
      if (is_dir($path)) list_recursive($path);       
    }
    closedir($d);
  }
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_iterative($from) {
  $dirs = array($from);  
  while (NULL !== ($dir = array_pop($dirs))) {  
    if ($dh = opendir($dir)) {    
      while (false !== ($entry = readdir($dh))) {      
        if ($entry == '.' || $entry == '..') continue;        

        $path = "$dir/$entry";        
        echo "$path\n";        
        if (is_dir($path)) $dirs[] = $path;        
      }      
      closedir($dh);      
    }    
  }  
}

// measured to 315% of gnu find's speed when run directly from a shell
function list_recursivedirectoryiterator($path) {
  $it = new RecursiveDirectoryIterator($path);
  foreach ($it as $file) {
    if ($file->isDot()) continue;

    echo $file->getPathname();
  }
}

// measured to 390% of gnu find's speed when run directly from a shell
function list_gnufind($dir) { 
  $dir = escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir", "r");
  while ('' != ($s = fread($h, 2048))) {
    echo $s;
  }
  pclose($h);
}
+2  A: 

try using the

example : RecursiveDirectoryIterator

man : RecursiveDirectoryIterator

Andrew Clark
Good advice (it didn't perform better, though).
neu242
+1  A: 

You're keeping N directory streams open where N is the depth of the directory tree. Instead, try reading an entire directory's worth of entries at once, and then iterate over the entries. At the very least you'll maximize use of the desk I/O caches.

Jason Cohen
+3  A: 

I'm not sure if the performance is better, but you could use a recursive directory iterator to make your code simpler... See RecursiveDirectoryIterator and 'SplFileInfo`.

$it = new RecursiveDirectoryIterator($from);
foreach ($it as $file)
{
    if ($file->isDot())
        continue;

    echo $file->getPathname();
}
Greg
Good advice (it didn't perform better, though).
neu242
+2  A: 

Why would you expect the interpreted PHP code to be as fast as the compiled C version of find? Being only twice as slow is actually pretty good.

About the only advice I would add is to do a ob_start() at the beginning and ob_get_contents(), ob_end_clean() at the end. That might speed things up.

jmucchiello
Yes, just as good would be very optimistic :) Output buffering didn't help, by the way...
neu242
+3  A: 

Before you start changing anything, profile your code.

Use something like Xdebug (plus kcachegrind for a pretty graph) to find out where the slow parts are. If you start changing things blindly, you won't get anywhere.

My only other advice is to use the SPL directory iterators as posted already. Letting the internal C code do the work is almost always faster.

Ant P.
A: 

You might want to seriously consider just using GNU find. If it's available, and safe mode isn't turned on, you'll probably like the results just fine:

function list_recursive($dir) { 
  $dir=escapeshellcmd($dir);
  $h = popen("/usr/bin/find $dir -type f", "r")
  while ($s = fgets($h,1024)) { 
    echo $s;
  }
  pclose($h);
}

However there might to be some directory that's so big, you're not going to want to bother with this either. Consider amortizing the slowness in other ways. Your second try can be checkpointed (for example) by simply saving the directory stack in the session. If you're giving the user a list of files, simply collect a pageful then save the rest of the state in the session for page 2.

geocar
When embedding the find utility like this, the performance is actually worse than PHP's performance. I guess I should be satisfied :)
neu242
escapeshellarg and shell_exec
troelskn
+1  A: 

PHP just cannot perform as fast as C, plain and simple.

zodeus
A: 

Try using scandir() to read a whole directory at once, as Jason Cohen has suggested. I've based the following code on code from the php manual comments for scandir()

 function scan( $dir ){
        $dirs = array_diff( scandir( $dir ), Array( ".", ".." ));
        $dir_array = Array();
        foreach( $dirs as $d )
            $dir_array[ $d ] = is_dir($dir."/".$d) ? scan( $dir."/".$d) : print $dir."/".$d."\n";
 }
Ian Gregory