views:

505

answers:

2

I've played with this for quite a while but am at a bit of a loss as to what to do. I'm using APC 3.1.3p1 on CentOs 5 with PHP 5.2.5. APC is acting as both the opcode cache and user cache. Mostly this server runs Drupal 6 sites using the CacheRouter module for APC cache support. I was running APC 3.0.19 for a while but it was causing Apache to lock up occasionally (a documented bug in that version of APC) so that's why I'm on 3.1.3p1.

I've configured APC to have 512 MBytes of memory (mmap).

The symptoms are a little intermittent but starting from an empty cache this is generally what I see:

  • The user cache fills rather slowly. Despite an initial insert rate of something like 20,000 inserts/sec, the user cache will only report a few hundred, then a few thousand entries, and will grow very slowly. I can possibly attribute this to write_locking being on but just want to mention it in case it's of importance in solving the problem at hand. After several hours it hits an equilibrium of around 30k entries.

  • Fragmentation sets in early and grows quickly. Within maybe 10 hours or so I'm usually at 100% fragmentation.

  • Overall (opcode + user) cache usage stabilizes around 240MB or so. It will virtually never go above that level. After a day or so I'll start seeing the User Cache Cache Full Count (UCCFC) incrementing.

At the time of this writing my UCCFC is at 62358 and growing despite APC reporting 280MB free. I have a user_ttl of 7200, but I've also played with setting it to 0 or other amounts and it has little to no effect on the problem.

I suspect the problem has something to do with fragmentation. Right now my server is reporting "Fragmentation: 100.00% (280.0 MBytes out of 280.0 MBytes in 24740 fragments)" and 280 MB just so happens to be the amount of free space APC is reporting; a telling coincidence, I think. Unfortunately, I've found precious little information in the docs or elsewhere to indicate just what "fragmentation" truly means in the APC world, and there seems to be virtually nothing you can do to avoid it.

Can anyone shed any light on this problem?

A: 

http://pecl.php.net/bugs/bug.php?id=13146 I think you should continue there or open a new bug report.

chx
+4  A: 

APC calculates the fragmentation percentage using the following formula:

(total_size_of_free_blocks_lt_5M / total_size_of_all_free_blocks) * 100

*Note that it only counts blocks smaller than 5M as fragmented.

I'll translate your specific case into plain english:

Fragmentation: 100.00% (280.0 MBytes out of 280.0 MBytes in 24740 fragments)

This means that of the 280M of your free blocks all of them are less than 5M. If you divide your free space by the number of fragments you'll see that this equates to an average fragment size of ~11.6K.

This means that if you attempt to store an item that is larger than all the available blocks, it will not fit, and one of two things will happen, based on the apc.user_ttl configuration setting. If the TTL is set to 0, then your entire user cache is flushed and the item inserted. If the TTL is set greater than 0 then it will flush expired entries and insert the item. In both of these cases the cache full count gets incremented. Having this increment as much as it is in your case is an indicator that you might be doing it wrong.

Here is a simple visualization of what fragmentation is doing to your cache over time. It represents a simple 32 Byte cache size, each block is 1B.

[--------------------------------] (starts empty)
[A-------------------------------] (1B stored)
[ABB-----------------------------] (2B stored)
[ABBCCCC-------------------------] (4B stored)
... (time elapses)
[A--CCCC-EEE--GGGGGG-III--KKKLLLL]

So now if you want to store item M, which is size 4B, you can't, because the largest available block is 2B. This triggers a cache full count increment, and a full or partial flush based on the user_ttl explained in detail above.

Now the question is: Is this bad in your case?

I think it might be. 100% cache fragmentation isn't bad in and of itself. It's not uncommon to see that on any production server running. However, to see it at 100% with that much free space is a sign that something might be wrong.

  • You could be caching too much; just because the cache is there doesn't mean you should shove everything into it.
  • You could be caching with too short of an TTL (for an entry), low TTLs mean non-free blocks are being freed more often.
  • It's also possible that you have a handful of really large items you're trying to store. At 100% fragmentation it's guaranteed that any item >= 5M won't fit. With your average free block size of 11.6K it's increasingly likely a given item won't fit as it's size increases past 11.6K.

You may want to try sorting your user cache by size and seeing what your biggest entries are, and what their TTLs are. Maybe they could be increased?

It's not really possible to give an exact diagnosis without being elbows deep in your application(s) and the usage patterns, but all of this information should set you on the right track. It's quite possible that it's a non-issue and you can just let APC do it's job quietly.

hobodave
I just wanted to follow up. We moved to APC 3.1.4 and its fragmentation performance is GREATLY improved. We're now getting behavior much more in line with what we'd expect.
Aaron