Associative arrays in Perl 5 are implemented with hash tables which have amortized O(1) (i.e. constant time) insert and lookup. That is why we tend to call them hashes not associative arrays.
It is difficult to find documentation that states that Perl 5 uses a hash table for implementation of associative arrays (besides the fact that we call associative arrays "hashes"), but there is at least this in perldoc perlfaq4
What happens if I add or remove keys from a hash while iterating over it?
(contributed by brian d foy)
The easy answer is "Don't do that!"
If you iterate through the hash with each(), you can delete the key
most recently returned without worrying about it. If you delete or add
other keys, the iterator may skip or double up on them since perl may
rearrange the hash table. See the entry for "each()" in perlfunc.
An even better quote from perldoc perldata
:
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true; more
precisely, the value returned is a string consisting of the number of
used buckets and the number of allocated buckets, separated by a slash.
This is pretty much useful only to find out whether Perl's internal
hashing algorithm is performing poorly on your data set. For example,
you stick 10,000 things in a hash, but evaluating %HASH in scalar
context reveals "1/16", which means only one out of sixteen buckets has
been touched, and presumably contains all 10,000 of your items. This
isn't supposed to happen. If a tied hash is evaluated in scalar
context, a fatal error will result, since this bucket usage information
is currently not available for tied hashes.
Of course, O(1) is only theoretical performance. In the real world we do not have perfect hashing functions, so hashes do slow down as they get larger, and there are some degenerate cases that turn a hash into O(n), but Perl does its best to prevent this from happening. Here is a benchmark of Perl hashes with 10, 100, 1,000, 10,000, 100,000 keys:
Perl version 5.012000
Rate 10^5 keys 10^4 keys 10^3 keys 10^2 keys 10^1 keys
10^5 keys 5688029/s -- -1% -4% -7% -12%
10^4 keys 5748771/s 1% -- -3% -6% -11%
10^3 keys 5899429/s 4% 3% -- -4% -9%
10^2 keys 6116692/s 8% 6% 4% -- -6%
10^1 keys 6487133/s 14% 13% 10% 6% --
Here is the benchmark code:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark;
print "Perl version $]\n";
my %subs;
for my $n (1 .. 5) {
my $m = 10 ** $n;
keys(my %h) = $m; #preallocated the hash so it doesn't have to keep growing
my $k = "a";
%h = ( map { $k++ => 1 } 1 .. $m );
$subs{"10^$n keys"} = sub {
return @h{"a", $k};
}
};
Benchmark::cmpthese -1, \%subs;