views:

292

answers:

3

To make this more clear, I'm going to put code samples:

$file = fopen('filename.ext', 'rb');

// Assume $pos has been declared
// method 1
fseek($file, $pos);
$parsed = fread($file, 2);

// method 2
while (!feof($file)) {
    $data = fread($file, 1000000);
}

$data = bin2hex($data);
$parsed = substr($data, $pos, 2);

$fclose($file);

There are about 40 fread() in method 1 (with maybe 15 fseek()) vs 1 fread() in method 2. The only thing I am wondering is if loading in 1000000 bytes is overkill when you're really only extracting maybe 100 total bytes (all relatively close together in the middle of the file).

So which code is going to perform better? Which code makes more sense to use? A quick explanation would be greatly appreciated.

+3  A: 

If you already know the offset you are looking for, fseek is the best method here, as there is no reason to load the whole file into memory if you only need a few bytes of it. The first method is better because you skip right to what you want in the file stream and read out a small portion. The second method requires you to read the entire file into memory, then seek through that while you could have just read it straight from the file. Hope this answers your question

robmerica
OTOH if you will need the whole file -eventually- (e.g. you check every byte, in 100 byte chunks), reading it all in is faster IF the file isn't huge - up to several megabytes. Otherwise you risk filling up RAM, running into swap space and slowing everything down to a crawl.
SF.
+2  A: 

Files are read in units of clusters, and a cluster is usually something like 8 kb. Usually a few clusters are read ahead.

So, if the file is only a few kb there is very little to gain by using fseek compared to reading the entire file. The file system will read the entire file anyway.

If the file is considerably larger, as in your case, only a few of the clusters has to be read, so the first method should perform better. At worst all the data will still be read from the disk, but your application will still use less memory.

Guffa
+1  A: 

It seems that seeking the position you want and then reading only be bytes you need is the best approach.

But the correct answer is (as always) to test it for real instead of guessing. Run your two examples in your server environment and make some time measurements. Also check memory usage. Then make your optimization once you have some hard data to back it up.

Martin Wickman