views:

92

answers:

1

I'm using BerkeleyDB to develop a small app. And I have a question about opening a database multiple time in BDB.

I have a large set of text ( corpus ), and I want to load a part of it to do the calculation. I have two pseudo-code (mix with python) here

@1

def getCorpus(token):
    DB.open()
    DB.get(token)
    DB.close()

@2

#open and wait
def openCorpus():
    DB.open()

#close database
def closeCorpus():
    DB.close()

def getCorpus(token):
    DB.get(token)

In the second example, I'll open the db before the calculation, load token for each loop and then close the db.

In the first example, each time the loop ask for the token, I'll open, get and then close the db.

Is there any performance lost ?

I also note that I'm using a DBEnv to manage the database

+3  A: 

If you aren't caching the opened file you will always get performance lost because:

  • you call open() and close() multiple times which are quite expensive,
  • you lose all potential buffers (both system buffers and bdb internal buffers).

But I wouldn't care too much about the performance before the code is written.

Piotr Czapla