views:

3551

answers:

7

I'm looking for general experiences from people who have used both, particularly on how the two compare on handling large numbers of records, transaction/concurrency/deadlock handling, and juicy stories about database corruption and backup procedures.

+6  A: 

Haven't used either, but I've read that Tokyo Cabinet appears to win on performance. Also, according to Simon Buchan (in the comments) and other accounts, Berkeley DB is unreliable, especially when coupled with a Subversion repository.

Nikhil Chelliah
Despite the FOSS cries of 'many eyes', I've found bugginess to be more a feature of the last developer's competence first and overall simplicity of the architecture second, rather than man-years total.
Simon Buchan
Oh, and according to SVN, Berkeley 'wedges' itself pretty often, so they recommend FSFS (just files on disk). That may be specific to their use of it, though.
Simon Buchan
Thanks for the link to the TokyoCabinet review on that blog!
Andy Dent
As Simon Buchan points out BerkleyDB does screwup when used in SVN. So its probably not perfect either.
Robert Gould
Thanks for the note about BDB and Subversion. That said, I just came across a contradictory article written two years ago: http://weblogs.asp.net/psteele/archive/2007/01/25/svn-and-berkeleydb.aspx
Nikhil Chelliah
@Robert Gould: I think you may have misread what I wrote - I didn't see many advantages for BDB.
Nikhil Chelliah
Lots of the BDB information from the early Subversion/MySQL days I feel is rather outdated; BDB has matured quite a bit and it used with good reliability by a number of projects...
fdr
+8  A: 

BDB is not only a pain in the ass to configure but when you start hitting some magic limit of a million or so records, performance drops drastically even in CDS-mode. Tokyo Cabinet performs really well even beyond millions of records. I recommend TC in every way.

rsms
+2  A: 

Interesting comments. One additional comment regarding BDB: for java users, there are 2 choices: native BDB ("bdb-c") -- which is probably what comments are mostly about -- and BDB-JE, java-based version. These are two very different code bases, different trade-offs and even reliability.

From what I understand, JE has very good concurrency support for writes (because it's log-based approach), but somewhat lower read performance. JE also is claimed to be much less prone to corruption or lockups. But it really mostly shines for use cases where number of writes is non-trivial, like when used as backend for message queues (write, read, delete cycle).

StaxMan
+4  A: 

I've used BDB for 3 years at Bookmooch.com to great success. Performance has been stunning, where I regularly run about 300,000 queries per second in a production environment (real code, lots of processing, not a benchmark) on a single machine. Test harnesses see around 1.2 million queries per second, but that's not the real world.

BDB is very stable, the API is nice and well document, as long as you either can write C or write your own middleware layer (my approach was to write my own middleware layer, and little bit of C for some performance critical things)

ps: my database is fairly large, with the largest table at 6 million rows and 32gb of data (it's a cache of Amazon's book data for 6 million books). I haven't seen any performance decreases, though I did switch the db drive to an SSD, which was essential for very high performance.

The main problem I ran into was the easy corruptibility of the database following an application crash. This was mostly due to my making a tech decision which led to lots of redundant writing to the database, making logs very large and consequently having to purge logs fairly soon, which compromises the ability of the db to use the log to recover from a crash. I've mostly fixed this by buffering writes for 60 seconds, so only a crash during the write cycle could cause a corruption. If you don't delete you log files, you won't have data corruption issues like I had.

Tokyo Cabinet looks interesting, but very new, so I'm waiting for now.

-- John from BookMooch

John from BookMooch
A: 

Tokyo Cabinet is LGPL. It's heir project (Kyoto Cabinet) is GPL though. This makes it from the licensing point of view as good or as bad as BDB is. A pitty...

fubra