Analyzing algorithms for asymptotic performance is working on the operations that must be performed and the cost they add to the equation. For that you need to first know what are the performed operations and then evaluate its costs.
Searching for a key in a balanced binary tree (which maps happen to be) require O( log N )
complex operations. Each of those operations implies comparing the key for a match and following the appropriate pointer (child) if the key did not match. This means that the overall cost is proportional to log N
times the cost of those two operations. Following pointers is a constant time operation O(1)
, and comparing keys depend on the key. For an integer key, comparisons are fast O(1)
. Comparing two strings is another story, it takes time proportional to the sizes of the strings involved O(L)
(where I have used intentionally L
as the length of string parameter instead of the more common N
.
When you sum all the costs up you get that using integers as keys the total cost is O( log N )*( O(1) + O(1) )
that is equivalent to O( log N )
. (O(1)
gets hidden in the constant that the O
notation silently hides.
If you use strings as keys, the total cost is O( log N )*( O(L) + O(1) )
where the constant time operation gets hidden by the more costly linear operation O(L)
and can be converted into O( L * log N )
. That is, the cost of locating an element in a map keyed by strings is proportional to the logarithm of the number of elements stored in the map times the average length of the strings used as keys.
Note that the big-O notation is most appropriate to use as an analysis tool to determine how the algorithm will behave when the size of the problem grows, but it hides many facts underneath that are important for raw performance.
As the simplest example, if you change the key from a generic string to an array of 1000 characters you can hide that cost within the constant dropped out of the notation. Comparing arrays of 1000 chars is a constant operation that just happens to take quite a bit of time. With the asymptotic notation that would just be a O( log N )
operation, as with integers.
The same happens with many other hidden costs, as the cost of creation of the elements that is usually considered as a constant time operation, just because it does not depend on the parameters to your problem (the cost of locating the block of memory in each allocation does not depend on your data set, but rather on memory fragmentation that is outside of the scope of the algorithm analysis, the cost of acquiring the lock inside malloc as to guarantee that not two processes try to return the same block of memory depends on the contention of the lock that depends itself number of processors, processes and how much memory requests they perform..., again out of the scope of the algorithm analysis). When reading costs in the big-O notation you must be conscious of what it really means.