To answer the original questioner's performance worries (for lookups in dict
vs set
), somewhat surprisingly, dict
lookups can be minutely faster (in Python 2.5.1 on my rather slow laptop) assuming for example that half the lookups fail and half succeed. Here's how one goes about finding out:
$ python -mtimeit -s'k=dict.fromkeys(range(99))' '5 in k and 112 in k'
1000000 loops, best of 3: 0.236 usec per loop
$ python -mtimeit -s'k=set(range(99))' '5 in k and 112 in k'
1000000 loops, best of 3: 0.265 usec per loop
doing each check several times to verify they're repeatable. So, if those 30 nanoseconds or less on a slow laptop are in an absolutely crucial bottleneck, it may be worth going for the obscure dict.fromkeys
solution rather than the simple, obvious, readable, and clearly correct set
(unusual -- almost invariably in Python the simple and direct solution has performance advantages too).
Of course, one needs to check with one's own Python version, machine, data, and ratio of successful vs failing tests, and confirm with extremely accurate profiling that shaving 30 nanoseconds (or whatever) off this lookup will make an important difference.
Fortunately, in the vast majority of cases, this will prove totally unnecessary... but since programmers will obsess about meaningless micro-optimizations anyway, no matter how many times they're told about their irrelevance, the timeit
module is right there in the standard library to make those mostly-meaningless micro-benchmarks easy as pie anyway!-)