views:

24

answers:

1

I have a file that is constantly added to (a process beyond my control) and I capture that file every x seconds. I want to extract the new contents of the file (added between my previous capture) and work with it. The file unfortunately doesn't have anything to signify when it was last added to and I can't write to this file, so my only option is to store what I already know is in the file and compare it to the new version I have.

Now what I need to know is how I can best do this. I'm using PHP and I figured the simplest solution is to just store the previous contents and then use explode() to work out what comes after it, this is (quite obviously) a terrible solution as once the file reaches large numbers (1GB+) it's going to be hell to process.

An idea I had would be to store the position of the final character and then work from there, for example if the last character was the 100th I'd then work from the 100th character on the next process, but I'm not sure how I could do this, or if it's even possible with PHP.

So my question is what is the correct method for doing this and how can I do it with PHP (if possible)? Functions or a general idea are fine, I'm good for the implementation, just not sure the theory behind it.

+2  A: 

Assuming the file is simply appended to, it would intuitively be easiest to store the previous file size and use fseek() or the offset parameter of file_get_contents to move to where the old version of the file ended. I.e.:

$old_position = (int)file_get_contents("last_position.temp");
file_put_contents("last_position.temp", filesize("thebigfile.txt"));

// There might be an off-by-one error here that I'm not paying attention to
$new_entry = file_get_contents("thebigfile.txt", false, "r", $old_position);

To get this rolling for the first time, you'll want to put 0 in last_position.temp so there's no errors or hard feelings.

Hope this helps :)

mattbasta
This looks like exactly what I need, thanks! How reliable is the accuracy of using the filesize as a position for the new file? Will it guarantee the same position each time?
citricsquid
`filesize()` should produce the exact size of the file in bytes. I can't speak as to how it will handle encodings with multi-byte characters, but I'd assume it would work as expected.
mattbasta