After reading over zend/zend_hash.h and ext/standard/array.c I think I have found the answer (Thankyou Chris and gumbo for the suggestions).
The PHP array is a chained hash table (lookup of O(c) and O(n) on key collisions) that allows for int and string keys. It uses 2 different hashing algorithms to fit the two types into the same hash key space. Also each value stored in the hash is linked to the value stored before it and the value stored after (linked list). It also has a temporary pointer which is used to hold the current item so the hash can be iterated.
The catch for the array_rand
function is that in order to assure that key is truly random, the array_rand
function must iterate over the array rand(0, count($array))
times (O(n)). This is because there is no way to move to an offset in the hash table in O(c) time because there is no guarantee that there isn't missing keys in the range.
This discovery has somewhat troubled me, because it means there is NO datatype in PHP that has normal C array characteristics. Now most of the time this is ok, because hash lookups are very faster, but it's faults show through in cases like array_rand
.
Another thing that also troubled me was the difference between the implementation of array_key_exists
and in_array
. array_key_exists
uses the hash lookup (most of the time O(c)) to see if a key is in the array, while in_array
has to linearly search the hash (O(n)).
Consider the two examples below:
in_array version
$array = range(0, 100000);
if( in_array( $random_key, $array ) ) {
//we found a value
}
array_key_exists version
$array = array_fill_keys( range(0, 100000), NULL );
if( array_key_exists( $random_key, $array ) ) {
//we found a value, err key
}
While the in_array version looks much cleaner and simpler to understand, it's actually very slow on large arrays (O(n)), where array_key_exists (despite is annoyingly long name) is very fast (O(c) or close).
In conclusion:
I wish there was a transparent flag in the zend HashTable data-structure that would be set in cases where the array was created using array_push
or array[] = $value
that would allow for scaling like a C array instead of a linked list.