Carl, in a follow-up answer, you asked about the relative performance of these two alternatives:
z1 = dict(x.items() + y.items())
z2 = dict(x, **y)
On my machine, at least (a fairly ordinary x86_64 running Python 2.5.2), alternative z2
is not only shorter and simpler but also significantly faster. You can verify this for yourself using the timeit
module that comes with Python.
Example 1: identical dictionaries mapping 20 consecutive integers to themselves
% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z1=dict(x.items() + y.items())'
100000 loops, best of 3: 5.67 usec per loop
% python -m timeit -s 'x=y=dict((i,i) for i in range(20))' 'z2=dict(x, **y)'
100000 loops, best of 3: 1.53 usec per loop
z2
wins by a factor of 3.5 or so. Different dictionaries seem to yield quite different results, but z2
always seems to come out ahead. (If you get inconsistent results for the same test, try passing in -r
with a number larger than the default 3.)
Example 2: non-overlapping dictionaries mapping 252 short strings to integers and vice versa
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z1=dict(x.items() + y.items())'
1000 loops, best of 3: 260 usec per loop
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z2=dict(x, **y)'
10000 loops, best of 3: 26.9 usec per loop
z2
wins by about a factor of 10. That's a pretty big win in my book!
After comparing those two, I wondered if z1
's poor performance could be attributed to the overhead of constructing the two item lists, which in turn led me to wonder if this variation might work better:
from itertools import chain
z3 = dict(chain(x.iteritems(), y.iteritems()))
A few quick tests, e.g.
% python -m timeit -s 'from itertools import chain; from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z3=dict(chain(x.iteritems(), y.iteritems()))'
10000 loops, best of 3: 66 usec per loop
lead me to conclude that z3
is somewhat faster than z1
, but not nearly as fast as z2
. Definitely not worth all the extra typing.
This discussion is still missing something important, which is a performance comparison of these alternatives with the "obvious" way of merging two lists: using the update
method. To try to keep things on an equal footing with the expressions, none of which modify x or y, I'm going to make a copy of x instead of modifying it in-place, as follows:
z0 = dict(x)
z0.update(y)
A typical result:
% python -m timeit -s 'from htmlentitydefs import codepoint2name as x, name2codepoint as y' 'z0=dict(x); z0.update(y)'
10000 loops, best of 3: 26.9 usec per loop
In other words, z0
and z2
seem to have essentially identical performance. Do you think this might be a coincidence? I don't....
In fact, I'd go so far as to claim that it's impossible for pure Python code to do any better than this. And if you can do significantly better in a C extension module, I imagine the Python folks might well be interested in incorporating your code (or a variation on your approach) into the Python core. Python uses dict
in lots of places; optimizing its operations is a big deal.
[UPDATE: You could also write this as
z0 = x.copy()
z0.update(y)
as Tony does, but (not surprisingly) the difference in notation turns out not to have any measurable effect on performance. Use whichever looks right to you. Of course, he's absolutely correct to point out that the two-statement version is much easier to understand.]