views:

63

answers:

2

I have a scenario where one PHP process is writing a file about 3 times a second, and then several PHP processes are reading this file.

This file is esentially a cache. Our website has a very insistent polling, for data that changes constantly, and we don't want every visitor to hit the DB every time they poll, so we have a cron process that reads the DB 3 times per second, processes the data, and dumps it to a file that the polling clients can then read.

The problem I'm having is that, sometimes, opening the file to write to it takes a long time, sometimes even up to 2-3 seconds. I'm assuming that this happens because it's being locked by reads (or by something), but I don't have any conclusive way of proving that, plus, according to what I understand from the documentation, PHP shouldn't be locking anything. This happens every 2-5 minutes, so it's pretty common.

In the code, I'm not doing any kind of locking, and I pretty much don't care if that file's information gets corrupted, if a read fails, or if data changes in the middle of a read. I do care, however, if writing to it takes 2 seconds, esentially, because the process that has to happen thrice a second now skipped several beats.

I'm writing the file with this code:

$handle = fopen(DIR_PUBLIC . 'filename.txt', "w");
fwrite($handle, $data);
fclose($handle);

And i'm reading it directly with:

file_get_contents('filename.txt')

(it's not getting served directly to the clients as a static file, I'm getting a regular PHP request that reads the file and does some basic stuff with it)

The file is about 11kb, so it doesn't take a lot of time to read/write. Well under 1ms.

This is a typical log entry when the problem happens:

  Open File:    2657.27 ms
  Write:    0.05984 ms
  Close:    0.03886 ms

Not sure if it's relevant, but the reads happen in regular web requests, through apache, but the write is a regular "command line" PHP execution made by Linux's cron, it's not going through Apache.

Any ideas of what could be causing this big delay in opening the file?
Any pointers on where I could look to help me pinpoint the actual cause?

Alternatively, can you think of something I could do to avoid this? For example, I'd love to be able to set a 50ms timeout to fopen, and if it didn't open the file, it just skips ahead, and lets the next run of the cron take care of it.

Again, my priority is to keep the cron beating thrice a second, all else is secondary, so any ideas, suggestions, anything is extremely welcome.

Thank you!
Daniel

+2  A: 

I can think of 3 possible problems:

  • the file get locked when read/writing to it by lower php system calls without you knowing it. This should block the file for 1/3s max. You get periods longer that that.
  • the fs cache starts a fsync() and the whole system blocks read/writes to the disk until that is done. As a fix you may try installing more RAM or upgrading the kernel or using a faster hard disk
  • your "caching" solution is not distributed, and it hits the worst acting piece of hardware in the whole system many times a second... meaning that you cannot scale it further by simply adding more machines, only by increasing the hdd speed. You should take a look at memcache or APC, or maybe even shared memory http://www.php.net/manual/en/function.shm-put-var.php

Solutions i can think of:

  • put that file in a ramdisk http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/ . This should be the simplest way to avoid hitting the disk so often, without other major changes.
  • use memcache. Its very fast when used locally, it scales well, used by "big" players. http://www.php.net/manual/en/book.memcache.php
  • use shared memory. It was designed for what you are trying to do here... having a "shared memory zone"...
  • change the cron scheduler to update less, maybe implement some kind of event system, so it would only update the cache when necessary, and not on a time basis
  • make the "writing" script write to 3 files different, and make the "readers" read from one of the files, randomly. This may allow a "distribution" of the loking across more files and it may reduce the chances that a certain file is locked when writing to it... but i doubt it would bring any noticeable benefits.
Quamis
Quamis, thank you very much for your reply. Some comments: I'm not a Linux expert, but it seems to be using 25% of the RAM if I'm reading htop's output correctly. I know this caching system is bad. If this were ASP.Net instead of PHP, i'd store that data in memory without a thought. But this is REALLY small to start using memcache or even to have more than one server. I know this is going to have a scalability problem if the site continues growing, but for now, moving to a more "serious" caching system doesn't make much sense.
Daniel Magliola
I did NOT know about PHP shared memory, I thought that was not possible with PHP. That looks VERY interesting indeed... I'm going to look into that. To be honest, I have no experience with memcached, and i'd like to avoid having to install a new system, given my sysadmin expertise. We will if we have to, but for now, I'd rather take a simpler approach. Thank you very much!!
Daniel Magliola
One question... Will PHP's shared memory work across the cron process and the Apache PHP processes? I have a cron process that runs for 5 minutes and ends (and gets restarted by the cron) writing, and then the Apache PHP processes responding to HTTP requests reading this shared memory. Will that work properly?
Daniel Magliola
@Daniel: yes, it should work across processes, that's the reason it was implemented:)
Quamis
I ended up going with the ramdisk solution, which had the least impact on the code, and it worked LIKE A CHARM. Thank you so much!!
Daniel Magliola
glad i could help:)
Quamis
A: 

You should use something really fast solution if you want to guarantee constant low open times. Maybe your OS is doing disk syncs, database file commits, or other things that you can not work around.

I suggest using memcached, redis, or even mongoDB for such tasks. You might even write your own caching daemon, even in php (however this is totally unnecessary, and can be tricky).

If you are absolutely, positively sure that you can only solve this task by this file cache, and you are under Linux, try to use different disk I/O scheduler, like deadline, OR (cfq AND decrease PHP process priority to -3 / -4).

netom
To be honest, I know there are better ways of doing this. I inherited a big codebase that did this, and there isn't much budget (or need really) to reengineer the whole thing. I know this is a crappy approach, but it should be *good enough* at least for a long while given our traffic levels. So I'm trying to work around this without rewriting a lot of code. If I can't, then I'll go into the more complex solutions.
Daniel Magliola