I'm looking for general experiences from people who have used both, particularly on how the two compare on handling large numbers of records, transaction/concurrency/deadlock handling, and juicy stories about database corruption and backup procedures.
Haven't used either, but I've read that Tokyo Cabinet appears to win on performance. Also, according to Simon Buchan (in the comments) and other accounts, Berkeley DB is unreliable, especially when coupled with a Subversion repository.
BDB is not only a pain in the ass to configure but when you start hitting some magic limit of a million or so records, performance drops drastically even in CDS-mode. Tokyo Cabinet performs really well even beyond millions of records. I recommend TC in every way.
Interesting comments. One additional comment regarding BDB: for java users, there are 2 choices: native BDB ("bdb-c") -- which is probably what comments are mostly about -- and BDB-JE, java-based version. These are two very different code bases, different trade-offs and even reliability.
From what I understand, JE has very good concurrency support for writes (because it's log-based approach), but somewhat lower read performance. JE also is claimed to be much less prone to corruption or lockups. But it really mostly shines for use cases where number of writes is non-trivial, like when used as backend for message queues (write, read, delete cycle).
I've used BDB for 3 years at Bookmooch.com to great success. Performance has been stunning, where I regularly run about 300,000 queries per second in a production environment (real code, lots of processing, not a benchmark) on a single machine. Test harnesses see around 1.2 million queries per second, but that's not the real world.
BDB is very stable, the API is nice and well document, as long as you either can write C or write your own middleware layer (my approach was to write my own middleware layer, and little bit of C for some performance critical things)
ps: my database is fairly large, with the largest table at 6 million rows and 32gb of data (it's a cache of Amazon's book data for 6 million books). I haven't seen any performance decreases, though I did switch the db drive to an SSD, which was essential for very high performance.
The main problem I ran into was the easy corruptibility of the database following an application crash. This was mostly due to my making a tech decision which led to lots of redundant writing to the database, making logs very large and consequently having to purge logs fairly soon, which compromises the ability of the db to use the log to recover from a crash. I've mostly fixed this by buffering writes for 60 seconds, so only a crash during the write cycle could cause a corruption. If you don't delete you log files, you won't have data corruption issues like I had.
Tokyo Cabinet looks interesting, but very new, so I'm waiting for now.
-- John from BookMooch
Tokyo Cabinet is LGPL. It's heir project (Kyoto Cabinet) is GPL though. This makes it from the licensing point of view as good or as bad as BDB is. A pitty...