views:

243

answers:

4

EDIT: Optimization results at end of this question!

hi, i have a following code to first scan files in a specific folder and then read every file line by line and after numerous "if...else if" write new modified file to another folder with the name name as it was when opened.

The problem is that writing a file line by line seems to be awfully slooooow. The default 60 seconds limit will only be enough for 25, or so, files. File sizes vary from 10k to 350k.

Any way to optimize code to make it running faster. Is it better to read line by lines, put every lines into an array and then write that whole array into a new text file (vs. line by line reading/writing). If it is, how it is done in practice.

thanks in advance ----- The code follows -----

<?php

function scandir_recursive($path)    {
...
...
}



$fileselection = scandir_recursive('HH_new');
foreach ($fileselection as $extractedArray) {
$tableName = basename($extractedArray); // Table name
$fileLines=file($extractedArray);
    foreach ($fileLines as $line) {
      if(preg_match('/\(all-in\)/i' , $line)) {
       $line = stristr($line, ' (all-in)', true) .', and is all in';
       $allin = ', and is all in';
      }
      else {
       $allin = '';
      }
      if(preg_match('/posts the small blind of \$[\d\.]+/i' , $line)) {
       $player = stristr($line, ' posts ', true);
       $betValue = substr(stristr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] = $betValue;
      }
      else if(preg_match('/posts the big blind of \$[\d\.]+/i' , $line)) {
       $player = stristr($line, ' posts ', true);
       $betValue = substr(stristr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] = $betValue;
      }
      else if(preg_match('/\S+ raises /i' , $line)) {
       $player = stristr($line, ' raises ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] = $betValue; //total bet this hand (shortcut)
      }
      else if(preg_match('/\S+ bets /i' , $line)) {
       $player = stristr($line, ' bets ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] = $betValue; //total bet this hand (shortcut)
      }
      else if(preg_match('/\S+ calls /i' , $line)) {
       $player = stristr($line, ' calls ', true);
       $betValue = substr(stristr($line, '$'), 1);
       $callValue = $betValue - $bettingMatrix[$player]['betTotal']; //actual amount called
       $bettingMatrix[$player]['betTotal'] = $betValue;
       $line = stristr($line, '$', true)."\$".$callValue.$allin;
       $allin = '';
      }
      else if(preg_match('/(\*\*\* (Flop|Turn|River))|(Full Tilt Poker)/i' , $line)) {
       unset($bettingMatrix); //zero $betValue
      }
      else if(preg_match('/\*\*\* FLOP \*\*\*/i' , $line)) {
       $flop = substr(stristr($line, '['), 0, -2);
       $line = '*** FLOP *** '. $flop;
      }
      else if(preg_match('/\*\*\* TURN \*\*\*/i' , $line)) {
       $turn = substr(stristr($line, '['), 0, -2);
       $line = '*** TURN *** '. $flop .' '. $turn;
      }
      else if(preg_match('/\*\*\* RIVER \*\*\*/i' , $line)) {
       $river = substr(stristr($line, '['), 0, -2);
       $line = '*** RIVER *** '. substr($flop, 0, -1) .' '. substr($turn, 1) .' '. $river;
      }
      else {
      }
     $ourFileHandle = fopen("HH_newest/".$tableName.".txt", 'a') or die("can't open file");
     fwrite($ourFileHandle, $line);
     fclose($ourFileHandle);
    }
}
?>


EDIT: Here's VERY interesting results after rewriting the code based on tips everyone here gave me.

60 text files, 5.8MB total

After all optimization (changed preg->strpos/strstr & $handle before loop): 4 sec.

As above BUT changed strpos/strstr -> stripos/stristr: 8 sec.

As above BUT changed stripos/stristr -> preg: 12 sec.

As above BUT changed fopen inside the loop: 45/60 files after 180sec run limit

Here's the complete script:

$fileselection = scandir_recursive('HH_new');
foreach ($fileselection as $extractedArray) {
    $tableName = basename($extractedArray); // Table name
    $handle   = fopen($extractedArray, 'r');
    $ourFileHandle = fopen("HH_newest/".$tableName.".txt", 'a') or die("can't open file");
    while ($line = fgets($handle)) {
      if (FALSE !== strpos($line, '(all-in)')) {
       $line = strstr($line, ' (all-in)', true) .", and is all in\r\n";
       $allin = ', and is all in';
      } else {
       $allin = '';
      }
      if (FALSE !== strpos($line, ' posts the small blind of $')) {
       $player = strstr($line, ' posts ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] = $betValue;
      }
      else if (FALSE !== strpos($line, ' posts the big blind of $')) {
       $player = strstr($line, ' posts ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] = $betValue;
      }
      else if (FALSE !== strpos($line, ' posts $')) {
       $player = strstr($line, ' posts ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $bettingMatrix[$player]['betTotal'] += $betValue;
      }
      else if (FALSE !== strpos($line, ' raises to $')) {
       $player = strstr($line, ' raises ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $betMade = $betValue - $bettingMatrix[$player]['betTotal']; //actual amount raised by
       $bettingMatrix[$player]['betTotal'] = $betValue; //$line contains total bet this hand (shortcut)
      }
      else if (FALSE !== strpos($line, ' bets $')) {
       $player = strstr($line, ' bets ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $betMade = $betValue - $bettingMatrix[$player]['betTotal']; //actual amount raised by
       $bettingMatrix[$player]['betTotal'] = $betValue; //$line contains total bet this hand (shortcut)
      }
      else if (FALSE !== strpos($line, ' calls $')) {
       $player = strstr($line, ' calls ', true);
       $betValue = substr(strstr($line, '$'), 1);
       $callValue = $betValue - $bettingMatrix[$player]['betTotal']; //actual amount called
       $bettingMatrix[$player]['betTotal'] = $betValue;
       $line = strstr($line, '$', true)."\$".$callValue.$allin. "\r\n";
       $allin = '';
      }
      else if (FALSE !== strpos($line, '*** FLOP ***')) {
       $flop = substr(strstr($line, '['), 0, -2);
       unset($bettingMatrix); //zero $betValue
      }
      else if (FALSE !== strpos($line, '*** TURN ***')) {
       $turn = substr(strstr($line, '['), 0, -2);
       $line = '*** TURN *** '.$flop.' '.$turn."\r\n";
       unset($bettingMatrix); //zero $betValue
      }
      else if (FALSE !== strpos($line, '*** RIVER ***')) {
       $river = substr(strstr($line, '['), 0, -2);
       $line = '*** RIVER *** '. substr($flop, 0, -1) .' '. substr($turn, 1) .' '. $river."\r\n";
       unset($bettingMatrix); //zero $betValue
      }
      else if (FALSE !== strpos($line, 'Full Tilt Poker')) {
       unset($bettingMatrix); //zero $betValue
      }
      else {
      }
     fwrite($ourFileHandle, $line);
    }
    fclose($handle);
    fclose($ourFileHandle);
}
+4  A: 

I doubt the file writing is the performance issue here. You're running ten regular expressions on everything!

Using string methods like strpos to find the sub-strings might speed things up.

Ben S
how do i replace my "preg_match('/\S+ calls /i' , $line)" with strpos()? Is there a subsitute for "\S+"?
mika
It's only 10 if none of the first 9 match. Given that, re-aranging the if-elseifs to test in order of probability would likely imporove performance somewhat.
Tom
+2  A: 

Doing away with the regular expression would give you the most performance increase, if you can change them to strpos() or similar - stripos() for case insensitive - you should notice a speed increase.

The test needs to be '!== false', since the found string may be at position 0. For example, your first test case could be ():

if(stripos($line, '(all-in)') !== false) {
    //generate output
}

You also may find using fgets() instead of reading the whole file at one time may give you some performance increase (but that's more a memory issue). And as mentioned by others, only write to the file in the loop, don't open and close it.

Tim Lytle
i've been wondering why it has to be "is not false" instead of just "true"? Does it give better performance or reliability?
mika
@mika - The PHP manual page for strpos() prominently explains that.
GZipp
@minka I added short explanation, and linked the function name to manual page.
Tim Lytle
performance gains added to original post. :)
mika
+4  A: 

i think this is because you're opening/closing file within the loop, try moving fopen() before foreach and fclose after it

stereofrog
then it will only write the last line of each file into a new files, won't it?
mika
I agree that you definitely don't want to be doing I/O that many times. It might be quicker just to append each modified line to a new string then write out the new string once.
ianhales
As long as you leave the fwrite() in the loop, it will just leave the file open and continue to write each line to it.
Sean
+1  A: 

Here's your code with a few tiny changes that should help quite a bit

  1. Switched from file() to fgets(). This will load only a single line at a time into memory instead of every line from the file.
  2. Changed your calls to preg_match() to stripos() where applicable. Should be a tiny bit faster
  3. Moved the opening/closing of $ourFileHandle into the outer loop. This will significantly reduce the number of stat calls to the filesystem and should speed it up greatly.

There are probably a lot of other optimizations that can be made in that monstrous if..else but i'll leave those up to another SOer (or you)

$fileselection = scandir_recursive('HH_new');
foreach ($fileselection as $extractedArray)
{ 
  $tableName     = basename( $extractedArray ); // Table name
  $handle        = fopen( $extractedArray, 'r' );
  $ourFileHandle = fopen("HH_newest/".$tableName.".txt", 'a') or die("can't open file");

  while ( $line = fgets( $handle ) )
  {
    if ( false !== stripos( $line, '(all-in)' ) )
    {
      $line = stristr($line, ' (all-in)', true) .', and is all in';
      $allin = ', and is all in';
    } else {
      $allin = '';
    }
    if ( preg_match('/posts the small blind of \$[\d\.]+/i' , $line ) )
    {
            $player = stristr($line, ' posts ', true);
            $betValue = substr(stristr($line, '$'), 1);
            $bettingMatrix[$player]['betTotal'] = $betValue;
    }
    else if(preg_match('/posts the big blind of \$[\d\.]+/i' , $line)) {
            $player = stristr($line, ' posts ', true);
            $betValue = substr(stristr($line, '$'), 1);
            $bettingMatrix[$player]['betTotal'] = $betValue;
    }
    else if(preg_match('/\S+ raises /i' , $line)) {
            $player = stristr($line, ' raises ', true);
            $betValue = substr(strstr($line, '$'), 1);
            $bettingMatrix[$player]['betTotal'] = $betValue; //total bet this hand (shortcut)
    }
    else if(preg_match('/\S+ bets /i' , $line)) {
            $player = stristr($line, ' bets ', true);
            $betValue = substr(strstr($line, '$'), 1);
            $bettingMatrix[$player]['betTotal'] = $betValue; //total bet this hand (shortcut)
    }
    else if(preg_match('/\S+ calls /i' , $line)) {
            $player = stristr($line, ' calls ', true);
            $betValue = substr(stristr($line, '$'), 1);
            $callValue = $betValue - $bettingMatrix[$player]['betTotal']; //actual amount called
            $bettingMatrix[$player]['betTotal'] = $betValue;
            $line = stristr($line, '$', true)."\$".$callValue.$allin;
            $allin = '';
    }
    else if(preg_match('/(\*\*\* (Flop|Turn|River))|(Full Tilt Poker)/i' , $line)) {
            unset($bettingMatrix); //zero $betValue
    }
    else if ( FALSE !== stripos( $line, '*** FLOP ***' ) )
    {
            $flop = substr(stristr($line, '['), 0, -2);
            $line = '*** FLOP *** '. $flop;
    }
    else if ( FALSE !== stripos( $line, '*** TURN ***' ) )
    {
            $turn = substr(stristr($line, '['), 0, -2);
            $line = '*** TURN *** '. $flop .' '. $turn;
    }
    else if ( FALSE !== stripos( $line, '*** RIVER ***' ) )
    {
            $river = substr(stristr($line, '['), 0, -2);
            $line = '*** RIVER *** '. substr($flop, 0, -1) .' '. substr($turn, 1) .' '. $river;
    }
    else {
    }
    fwrite($ourFileHandle, $line);
  }
  fclose( $handle );
  fclose( $ourFileHandle );
}
Peter Bailey
thanks.. i will rewrite my code based on your code here and keep in my the other ideas everybody has told here.
mika