tags:

views:

109

answers:

2

Using the shelve module has given me some surprising behavior. keys(), iter(), and iteritems() don't return all the entries in the shelf! Here's the code:

cache = shelve.open('my.cache')
# ...
cache[url] = (datetime.datetime.today(), value)

later:

cache = shelve.open('my.cache')
urls = ['accounts_with_transactions.xml', 'targets.xml', 'profile.xml']
try:
    print list(cache.keys()) # doesn't return all the keys!
    print [url for url in urls if cache.has_key(url)]
    print list(cache.keys())
finally:
    cache.close()

and here's the output:

['targets.xml']
['accounts_with_transactions.xml', 'targets.xml']
['targets.xml', 'accounts_with_transactions.xml']

Has anyone run into this before, and is there a workaround without knowing all possible cache keys a priori?

A: 

Seeing your examples, my first thought is that cache.has_key() has side effects, i.e. this call will add keys to the cache. What do you get for

print cache.has_key('xxx')
print list(cache.keys())
Aaron Digulla
I updated the question code with an abbreviated version of urls
Michael Deardeuff
Still doesn't compute; you add a single URL to the cache but after `has_key`, the cache has two URLs. There must be some side effect.
Aaron Digulla
The cache actually does have 2 urls in there. but keys() doesn't show them both until I test for each explicitly.
Michael Deardeuff
Can you post has_key(), please?
Aaron Digulla
+2  A: 

According to the python library reference:

...The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small...

This correctly reproduces the 'bug':

import shelve

a = 'trxns.xml'
b = 'foobar.xml'
c = 'profile.xml'

urls = [a, b, c]
cache = shelve.open('my.cache', 'c')

try:
    cache[a] = a*1000
    cache[b] = b*10000
finally:
    cache.close()


cache = shelve.open('my.cache', 'c')

try:
    print cache.keys()
    print [url for url in urls if cache.has_key(url)]
    print cache.keys()
finally:
    cache.close()

with the output:

[]
['trxns.xml', 'foobar.xml']
['foobar.xml', 'trxns.xml']

The answer, therefore, is don't store anything big—like raw xml—but rather results of calculations in a shelf.

Michael Deardeuff
In Python 2.6 I can't reproduce this "bug".
NicDumZ
You may not be using dbm, try whichdb.whichdb('my.cache'). On my mac, it produces 'dbm'. I haven't tried it out on any other systems yet.
Michael Deardeuff