Perl hashes use a technique known as bucket-chaining. All keys that have the same hash (see the macro PERL_HASH_INTERNAL
in hv.h
) go in the same “bucket,” a linear list.
According to the perldata documentation
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH
in scalar context reveals "1/16"
, which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen. If a tied hash is evaluated in scalar context, a fatal error will result, since this bucket usage information is currently not available for tied hashes.
To see whether your dataset has a pathological distribution, you could inspect the various levels in scalar context, e.g.,
print scalar(%$this), "\n",
scalar(%{ $this->{date} }), "\n",
scalar(%{ $this->{date}{"school 1"} }), "\n",
...
For a somewhat dated overview, see How Hashes Really Work at perl.com.
The modest reduction in the lengths of students' names, keys that are four levels down, won't make a significant difference. In general, the perl implementation has a strong bias toward throwing memory at problems. It ain't your father's FORTRAN.