tags:

views:

313

answers:

5

I've read previous answers here about caching in PHP, and the articles they link to. I've checked out the oft-recommended Pear Cache_Light, QuickCache, and WordPress Super Cache. (Sorry - apparently I'm allowed to hyperlink only once.)

Either none deal with concurrency issues, or none explicitly call out that they do in their documentation.

Can anyone point me in the direction of a PHP page cache that handles concurrency?

This is on a shared host, so memcache and opcode caches are unfortunately not an option. I don't use a templating engine and would like to avoid taking a dependency on one. WP Super Cache's approach is preferable - i.e. storing static files under wwwroot to let Apache serve them - but not a requirement.

Thanks!

P.S. Examples of things that should be handled automatically:

  1. Apache / the PHP cache is in the middle of reading a cached file. The cached file becomes obsolete and deletion is attempted.
  2. A cached file was deleted because it was obsolete. A request for that file comes in, and the file is in the process of being recreated. Another request for the file comes in during this.
A: 
  1. Under Linux, generally, the file will remain "open" for read, even if it's "deleted" until the process closes the file. This is something built into the system, and can sometimes cause huge discrepancies in disk usage sizes (deleting a 3G file while it's still "open" would mean that is still allocated on the disk as in use until the process closes it) - I'm unsure as to whether the same is true under linux.
  2. Assuming a Journalling Filesystem (most Linux Filesystems, and NTFS) - then the file should not be seen as "created" until the process closes the file. This should show up as a non-existant file!
Mez
Edit thanks, this is informative. The "What happens?" questions in the P.S. were merely rhetorical. I've updated the question to reflect this.
WalterGR
Ack! I forgot that I wrote "Implementing / finding" in the question title. I removed "implementing." If it comes to that, I'll open another question. Thanks again.
WalterGR
A: 

Assuming a Journalling Filesystem (most Linux Filesystems, and NTFS) - then the file should not be seen as "created" until the process closes the file. This should show up as a non-existant file!

Nope, it is visible as soon as it is created, you have to lock it. Rename is atomic though. So you could open(), write(), close(), rename(), but this will not prevent the same cache item being re-created twice at the same time.

A cached file was deleted because it was obsolete. A request for that file comes in, and the file is in the process of being recreated. Another request for the file comes in during this.

If it is not locked, a half-complete file will be served, or two processes will try to regenerate the same file at the same time, giving "interesting" results.

peufeu
A: 

Hi,

It seems PEAR::Cache_Lite has some kind of security to deal with concurrency issues.
If you take a look at the manual of constructor Cache_Lite::Cache_Lite, you have those options :

fileLocking enable / disable fileLocking. Can avoid cache corruption under bad circumstances.

writeControl enable / disable write control. Enable write control will lightly slow the cache writing but not the cache reading. Write control can detect some corrupt cache files but maybe it's not a perfect control.

readControl enable / disable read control. If enabled, a control key is embeded in cache file and this key is compared with the one calculated after the reading

readControlType Type of read control (only if read control is enabled). Must be 'md5' (for a md5 hash control (best but slowest)), 'crc32' (for a crc32 hash control (lightly less safe but faster)) or 'strlen' (for a length only test (fastest))

Which one to use is still up to you, and will depend on what kind of performance you are ready to sacrifice -- and the risk of concurrency access that probably exists in your application.


You might also want to take a look at Zend_Cache_Frontend_Output, to cache a page, using something like Zend_Cache_Backend_File as backend.

That one seems to support some kind of security as well -- the same kinf of stuff that Cache_Lite already gave you (so I won't copy-paste a second time)


As a sidenote, if your website runs on a shared host, I suppose it doesn't have that many users ? So the risks of concurrent access are probably not that high, are they ?

Anyway, I probably would not search any farther that what those tow Frameworks propose : it is already probably more than enough for the needs of your application :-)

(I've never seen any caching mecanism "more secure" than what those allow you to do... And i've never run into some catastrophic concurrency problem of that sort yet... In 3 years of PHP-development)


Anyway : have fun !

Pascal MARTIN
Pear Cache_Light doesn't handle all concurrency issues - for example, reading from the cache when a file is locked for writing. (Or perhaps I should say: concurrency issues are punted to application code.) That's a no-go for me. You asked, "...So the risks of concurrent access are probably not that high, are they?" While visits obviously follow a larger trend (e.g. more at mid-day, more on Wednesday) page requests from second-to-second are random. Concurrency is an issue for all websites.
WalterGR
A: 

You could cache pages in the database, just create a simple "name,value" table and store cached pages on it.

rayed
Thanks. While a database would avoid the corruption issues possible with files (due to the "C" in ACID) there are still other things to handle properly, e.g. the "I" in ACID. It's not a matter of "just create a DB table."
WalterGR
+1  A: 

I would be tempted to modify one of the existing caches. Zend Framework's cache should be able to do the trick. If not, I would change it.

You could create a really primitive locking strategy. The database could be used to track all of the cached items, allow locking for update, allow people to wait for someone else's update to complete, ...

That would handle your ACID issues. You could set the lock for someone else's update to a very short period, or possibly have it just skip the cache altogether for that round trip depending on your server load/capacity and the cost of producing the cached content.

Jacob

TheJacobTaylor