views:

562

answers:

3

I've spent hours searching for examples of how to use the bsddb module and the only ones that I've found are these (from here):

data = mydb.get(key)
if data:
    doSomething(data)
#####################
rec = cursor.first()
while rec:
    print rec
    rec = cursor.next()
#####################
rec = mydb.set()
while rec:
    key, val = rec
    doSomething(key, val)
    rec = mydb.next()

Does anyone know where I could find more (practical) examples of how to use this package?

Or would anyone mind sharing code that they've written themselves that used it?

Edit:

The reason I chose the Berkeley DB was because of its scalability. I'm working on a latent semantic analysis of about 2.2 Million web pages. My simple testing of 14 web pages generates around 500,000 records. So doing the math out... there will be about 78.6 Billion records in my table.

If anyone knows of another efficient, scalable database model that I can use python to access, please let me know about it! (*lt_kije* has brought it to my attention that bsddb is depricated in Python 2.6 and will be gone in 3.*)

+3  A: 

Searching for "import bsddb", I get:

...but personally I'd heavily recommend you use sqlite instead of bsddb, people are using the former a lot more for a reason.

James Antill
Thanks for telling me how you found them too. I'd forgotten that trick.
tgray
Unfortunately I don't think sqlite will scale well enough for my application (updated question). If you know that sqlite will work (with some certainty), please let me know!
tgray
I'm not sure sqlite will scale that well, but I'm also not sure bsddb will scale well either. If you are creating the data and then accessing it a lot, cdb might be your best bet.
James Antill
I'm using Windows, so I don't think cdb is an option. At least, the docs say it is for UNIX.
tgray
+3  A: 

These days, most folks use the anydbm meta-module to interface with db-like databases. But the API is essentially dict-like; see PyMOTW for some examples. Note that bsddb is deprecated in 2.6.1 and will be gone in 3.x. Switching to anydbm will make the upgrade easier; switching to sqlite (which is now in stdlib) will give you a much more flexible store.

lt_kije
but how scalable is SQLLite? One of the reasons I chose to use the Berkeley DB was because "Berkeley DB scales up extremely well. It can manage multi-terabyte tables with single records as large as four gigabytes."
tgray
I think sqlite can handle databases up to 2TB, though I haven't pushed it nearly that far myself. Your quote seems to come from Oracle's db documentation. I don't believe that that has much to do with the implementations supported by Python.What exactly are you trying to do?
lt_kije
Ah -- your new comment helps. ;) At that scale, I think you're best off using an RDBMS (PostgreSQL, MySQL, etc). SQLite will be a good starting place, since it provides a DBAPI interface that will be compatible with the major RDBMS connectors in Python.
lt_kije
Thanks for the tip! I'll go check them out.
tgray
bsddb is deprecated only because it was too difficult for the python team to maintain, it is still going to be developed as an external module. SQLLite is a SQL Database and as such has more overhead than bsddb
Ed L
+1  A: 

you may have a look in the sources of the pybsddb package there is several examples dbshelves among them, and dbtables

phmr