views:

262

answers:

5

This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:

AJAX POST Request made to this script

<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>

It makes a GET request to this external script which reads a text file

<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
    $lines = file('text/'.$fileName.'.txt');
    echo $lines[sizeof($lines)-1];
}
else{
    echo 0;
}
?>

I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!

A: 

If the files are unchanging, you should cache the last line.

If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.

Edit:

Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.

Thomas L Holaday
Yeh, the files are changing all the time. I like the last idea, reversing the order of lines so I only retrieve the first line. I need every little bit of performance gains I can get!
Abs
+1  A: 

readfile is your friend here it reads a file on disk and streams it to the client.

script 1:

<?php
  session_start();
  // added basic argument filtering
  $fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);

  $fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
  if (file_exists($fileName)) {

      // script 2 could be pasted here

      //for the entire file
      //readfile($fileName);

      //for just the last line
      $lines = file($fileName);
      echo $lines[count($lines)-1];


      exit(0);
  }

  echo 0;
?>

This script could further be improved by adding caching to it. But that is more complicated. The very basic caching could be.

script 2:

<?php

  $lastModifiedTimeStamp filemtime($fileName);

  if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
      $browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
      if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
          header("HTTP/1.0 304 Not Modified");
          exit(0);
      }
  }

  header('Content-Length: '.filesize($fileName));
  header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
  header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
Jacco
Would you say I should use that if the client only needs the last line?
Abs
'script 1' replaces your 2 scripts. readfile() reads the entire file and sends it to the browser. if you are unsure, forget about 'script 2'
Jacco
Does file_exists work on remote files (URLs)?
Abs
more info here (yes it can read remote files): http://nl2.php.net/file_exists
Jacco
+1  A: 

This script should limit the locations and file types that it's going to return.

Think of somebody trying this:

http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)

Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.

If the request is slow, try caching results locally.

If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.

Wouter van Nifterick
A: 

First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:

You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).

As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.

If your files are big, however, you might want to use fopen() and its friends fseek() and fread()

# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
    fseek($filePointer, -($i * $chunkSize), SEEK_END);
    $line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
soulmerge
Thanks for your insightful reply. I have not used that(xdebug) before but I am looking into it now. The text files I read in are no longer than 20 lines and the lines are about 5/6 words long.
Abs
I wouldn't touch the code, then. The overhead of interpreting the loop could even be greater than the gain with files of that size :)
soulmerge
Ah you're right, thanks! :)
Abs
A: 

The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.

(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)

I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.

If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.

Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.

Ramon Poca