tags:

views:

1394

answers:

10

I am curious, is there a size limit on serialize in PHP. Would it be possible to serialize an array with 5,000 keys and values so it can be stored into a cache?

I am hoping to cache a users friend list on a social network site, the cache will need to be updated fairly often but it will need to be read almost every page load.

On a single server setup I am assuming APC would be better then memcache for this.

+2  A: 

The only practical limit is your available memory, since serialization involves creating a string in memory.

Paul Dixon
+2  A: 

There's no limit enforced by PHP. Serialize returns a bytestream representation (string) of the serialized structure, so you would just get a large string.

zombat
+3  A: 

The serialize() function is only limited by available memory.

Byron Whitlock
+1  A: 

There is no limit, but remember that serialization and unserialization has a cost.

Unserialization is exteremely costly.

A less costly way of caching that data would be via var_export() as such (since PHP 5.1.0, it works on objects):

$largeArray = array(1,2,3,'hello'=>'world',4);

file_put_contents('cache.php', "<?php\nreturn ".
                                var_export($largeArray, true).
                                ';');

You can then simply retrieve the array by doing the following:

$largeArray = include('cache.php');

Resources are usually not cache-able.

Unfortunately, if you have circular references in your array, you'll need to use serialize().

Andrew Moore
this sounds nice except i am not sure in my case, this would create like 100,000 files with 100,000 members on my site, when I said cache I should of clarified APC or else memcache.
jasondavis
You should of specified that in your OP.
Andrew Moore
var_export would require eval() (which is icky). its also at least 3x slower to var_export than it is to serialize, and var_export would use more post-serialized memory since its not quite as compact a datastructure.
Justin
+12  A: 

As quite a couple other people answered already, just for fun, here's a very quick benchmark (do I dare calling it that ? ) ; consider the following code :

$num = 1;

$list = array_fill(0, 5000, str_repeat('1234567890', $num));

$before = microtime(true);
for ($i=0 ; $i<10000 ; $i++) {
    $str = serialize($list);
}
$after = microtime(true);

var_dump($after-$before);
var_dump(memory_get_peak_usage());

I'm running this on PHP 5.2.6 (the one bundled with Ubuntu jaunty).
And, yes, there are only values ; no keys ; and the values are quite simple : no object, no sub-array, no nothing but string.

For $num = 1, you get :

float(11.8147978783)
int(1702688)

For $num = 10, you get :

float(13.1230671406)
int(2612104)

And, for $num = 100, you get :

float(63.2925770283)
int(11621760)

So, it seems the bigger each element of the array is, the longer it takes (seems fair, actually). But, for elements 100 times bigger, you don't take 100 times much longer...


Now, with an array of 50000 elements, instead of 5000, which means this part of the code is changed :

$list = array_fill(0, 50000, str_repeat('1234567890', $num));

With $num = 1, you get :

float(158.236332178)
int(15750752)

Considering the time it took for 1, I won't be running this for either $num = 10 nor $num = 100...


Yes, of course, in a real situation, you wouldn't be doing this 10000 times ; so let's try with only 10 iterations of the for loop.

For $num = 1 :

float(0.206310987473)
int(15750752)

For $num = 10 :

float(0.272629022598)
int(24849832)

And for $num = 100 :

float(0.895547151566)
int(114949792)

Yeah, that's almost 1 second -- and quite a bit of memory used ^^
*(No, this is not a production server : I have a pretty high memory_limit on this development machine ^^ )*


So, in the end, to be a bit shorter than those number -- and, yes, you can have numbers say whatever you want them to -- I wouldn't say there is a "limit" as in "hardcoded" in PHP, but you'll end up facing one of those :

  • max_execution_time (generally, on a webserver, it's never more than 30 seconds)
  • memory_limit (on a webserver, it's generally not muco more than 32MB)
  • the load you webserver will have : while 1 of those big serialize-loop was running, it took 1 of my CPU ; if you are having quite a couple of users on the same page at the same time, I let you imagine what it will give ;-)
  • the patience of your user ^^

But, except if you are really serializing long arrays of big data, I am not sure it will matter that much...
And you must take into consideration the amount of time/CPU-load using that cache might help you gain ;-)

Still, the best way to know would be to test by yourself, with real data ;-)


And you might also want to take a look at what Xdebug can do when it comes to profiling : this kind of situation is one of those it is useful for!

Pascal MARTIN
+1 for the cool "benchmark". Interesting
Byron Whitlock
Serializing the same data of the same datatype over and over again is not exactly a real benchmark. Plus, the real cost is not serializing, it's unserializing.
Andrew Moore
True (that's why I wasn't sure I could call it a benchmark ^^), and very true (because it's unserializing that's going to be done the most -- else, there is absolutly no reason to put that in cache)
Pascal MARTIN
thanks thats useful
jasondavis
**@jasondavis:** Using var_export() per my answer will save you the cost of unserialization.
Andrew Moore
@Andrew : If he is storing this into APC or memcache, I'm not sure var_export would do the trick : there will be no file to include, and, out of the blue, I don't see a "nice way" to get that data back without evaluating it ; do you have a way ? (I might just not be thinking about it ^^ )
Pascal MARTIN
(sorry : I only saw, after posting my previous comment, that the OP had just been editted to add mention of APC/memcached ; I thought it was there before -- hence my previous comment)
Pascal MARTIN
Another real limit is how much you can assign to a key in memcache/apc - memcache limits one key to one page of memory (usually 1MB). APC may do the same.BTW, I did see a benchmark on include() vs unserialize(file_get_contents()) and suprisingly the unserialize won. That didnt take into account the memory the file_get_contents would have used tho.
Justin
I don't know the size limit of APC entries (I've seen things ranging up to a couple hundred KB ; maybe once or twice more than 1 MB) ; but I don't like when I see too much "big" entries -- And I consider that 50k is already quite big (it's not based on any number nor benchmark ; just on a calculation of the average entry size I have on a not too small website) ; results of that benchmark are surprizing, indeed ; did they use any opcode cache ?
Pascal MARTIN
@Pascal not sure - I'm trying to run your above with serialize/unserialize and var_export/include and var_export/eval to compare.
Justin
@Justin : OK :-) if you get any interesting result, let us know ;-)
Pascal MARTIN
A: 

Nope, there is no limit and this:

set_time_limit(0);
ini_set('memory_limit ', -1);

unserialize('s:2000000000:"a";');

is why you should have safe.mode = On or a extension like Suhosin installed, otherwise it will eat up all the memory in your system.

Alix Axel
A: 

I think better than serialize is json_encode function. It got a drawback, that associative arrays and objects are not distinguished, but string result is smaller and easier to read by human, so also to debug and edit.

Thinker
**json_encode:** works fine for primitive types, but as soon as you have objects, you can't use it without losing data fidelity.
Andrew Moore
A: 

If you want to cache it (so I assume performance is the issue), use apc_add instead to avoid the performance hit of converting it to a string + gain cache in memory.

As stated above the only size limit is available memory.

A few other gotchas: serialize'd data is not portable between multi-byte and single-byte character encodings. PHP5 classes include NUL bytes that can cause havoc with code that doesn't expect them.

A: 

Your use case sounds like you're better off using a database to do that rather than relying solely on PHP's available resources. The advantages to using something like MySQL instead is that it's specifically engineered with memory management in mind for such things as storage and lookup.

It's really no fun constantly serializing and unserializing data just to update or change a few pieces of information.

Robert Elwell
+1  A: 

Ok... more numbers! (PHP 5.3.0 OSX, no opcode cache)

@Pascal's code on my machine for n=1 at 10k iters produces:

float(18.884856939316)
int(1075900)

I add unserialize() to the above as so.

$num = 1;

$list = array_fill(0, 5000, str_repeat('1234567890', $num));

$before = microtime(true);
for ($i=0 ; $i<10000 ; $i++) {
    $str = serialize($list);
    $list = unserialize($str);
}
$after = microtime(true);

var_dump($after-$before);
var_dump(memory_get_peak_usage());

produces

float(50.204112052917)
int(1606768)

I assume the extra 600k or so are the serialized string.

I was curious about var_export and its include/eval partner $str = var_export($list, true); instead of serialize() in the original produces

float(57.064643859863)
int(1066440)

so just a little less memory (at least for this simple example) but way more time already.

adding in eval('$list = '.$str.';'); instead of unserialize in the above produces

float(126.62566018105)
int(2944144)

Indicating theres probably a memory leak somewhere when doing eval :-/.

So again, these aren't great benchmarks (I really should isolate the eval/unserialize by putting the string in a local var or something, but I'm being lazy) but they show the associated trends. var_export seems slow.

Justin
Thanks for those tests!
Pascal MARTIN