views:

261

answers:

7

What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.

Could I just open the file and append the data. Or should i open, lock, write and unlock.

What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.

If you have been, thank you for reading!

+2  A: 

If concurrency is an issue, you should really be using databases.

Pickle
I am writing to a log file. The write to the database if heavy already, so for the log file i would like to just write to file.
Saif Bechan
A: 

Use flock()

See this question

Alex
A: 

If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.

dusoft
The standard linux filesystems most certainly do *not* "take care" of simultaneous appends. For a simple example of why this is, imagine this: two processes open the same file within milliseconds of each other, and append a gig of data at 100 meg per second -- meaning a 10 second write time for each process. How would the filesystem ensure that the writes don't overlap? Is it supposed to buffer the data from the second process until the first process completes?
Frank Farmer
well, since you have use absolutely absurd example, i will just say this: PHP is unable to append 100 meg per second, neither it can process those data. although, you can be correct about hypothetical situations like this, reality is a different thing and as someone described above, appending just works. eat that.
dusoft
Posted a simple, real example showing multiple simultaneous appends clobbering each other. http://stackoverflow.com/questions/2358818/how-to-write-to-file-in-large-php-applicationmultiple-questions/2360472#2360472
Frank Farmer
+4  A: 

I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...

Laimoncijus
+1. I get overlapping entries in my apache access log every day, so at high load some loss definitely occurs. But for a log file, the few rows that get mangled really don't matter, and they certainly aren't worth the overhead of file locking.
Frank Farmer
+3  A: 

Here's a simple example that highlights the danger of simultaneous wites:

<?php
for($i = 0; $i < 100; $i++) {
 $pid = pcntl_fork();
 //only spawn more children if we're not a child ourselves
 if(!$pid)
  break;
}

$fh = fopen('test.txt', 'a');

//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);

$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
    fwrite($fh, $line);
}

fclose($fh);

echo "done\n";

If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.

You can find corrupted lines with grep '^[^0-9]' test.txt

This is because file append is only atomic if:

  1. You make a single fwrite() call
  2. and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
  3. and you write to a fully POSIX-compliant filesystem

If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.

Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.

Frank Farmer
+1 Point well made sir
Saif Bechan
A: 

If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api. You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?

dzen
+1  A: 

These are my 2p.

Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.

Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.

Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.

Roberto Aloi