views:

120

answers:

2

Hi Folks,

I'm looking for most fastest/effective way of deleting certain keys in a python dict

Here are some options

for k in somedict.keys(): 
    if k.startswith("someprefix"): 
        del somedict[k]

or

dict((k, v) for (k, v) in somedict.iteritems() if not k.startswith('someprefix'))

Logically first snippet should be faster on smaller dicts, it doesn't create a copy of a dict but creates a list of all keys, however double lookups and dict rebuilding is time consuming. While the second is faster on bigger dicts, but requires 2x more memory. I've checked my assumption in some small benchmark.

Anything faster?

Cheers

+2  A: 

If the dict is large enough, it may make sense to generate a whole new dict instead.

dict((k, v) for (k, v) in somedict.iteritems() if not k.startswith('someprefix'))
Ignacio Vazquez-Abrams
Thanks, my first solution was exactly as yours, interesting to benchmark which is faster.
HardQuestions
+4  A: 

Not only is del more easily understood, but it seems slightly faster than pop():

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "for k in d.keys():" "  if k.startswith('f'):" "    del d[k]"
1000000 loops, best of 3: 0.733 usec per loop

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "for k in d.keys():" "  if k.startswith('f'):" "    d.pop(k)"
1000000 loops, best of 3: 0.742 usec per loop

Edit: thanks to Alex Martelli for providing instructions on how to do this benchmarking. Hopefully I have not slipped up anywhere.

First measure the time required for copying:

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()"
1000000 loops, best of 3: 0.278 usec per loop

Benchmark on copied dict:

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()" "for k in d1.keys():" "  if k.startswith('f'):" "    del d1[k]"
100000 loops, best of 3: 1.95 usec per loop

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()" "for k in d1.keys():" "  if k.startswith('f'):" "    d1.pop(k)"
100000 loops, best of 3: 2.15 usec per loop

Subtracting the cost of copying, we get 1.872 usec for pop() and 1.672 for del.

Adam Bernier
You think pop faster than delete?
HardQuestions
@Mike: added benchmark to show that your way is slightly faster than using `pop()` while also being more readable IMO.
Adam Bernier
@Adam Bernier just because pop returns a value while del not. With bigger dict or bigger values the difference should be notable.
HardQuestions
@Adam, very wrong way to use `timeit`: out of those 1000000 loops, 999999 run over a 1-entry dict with `bar` as the only key (the `-s` setup code is not repeated before every loop). You need to make and alter a `d1=d.copy()` (with this statement as part of the code being measured by `timeit`) -- this kind of thing is crucial any time you're measuring code that alters data. You can normalize (to find timing ratios) by adding such a copy to all variants you're timing, separately measuring **just** the `copy`, and subtracting its time from the timings of the code variants you're considering.
Alex Martelli
@Alex: thanks very much for such a precise analysis. I hope that I have not botched your instructions in my edit.
Adam Bernier
@Adam, you're welcome, and, +1, as it seems right now (on cursory checking at least).
Alex Martelli
Guys, I think we are bit away of original question. Sure del will be always faster than pop because last return the value.
HardQuestions