views:

327

answers:

3

I'm designing a Java based web-app and I need a key-value store. Berkeley DB seems fitting enough for me, but there appears to be TWO Berkeley DBs to choose from: Berkeley DB Core which is implemented in C, and Berkeley DB Java Edition which is implemented in pure Java.

The question is, how to choose which one to use? With web-apps scalability and performance is quite important (who knows, maybe my idea will become the next Youtube), and I couldn't find easily any meaningful benchmarks between the two. I have yet to familiarize with Cores Java API, but I find it hard to believe that it could be much worse than Java Editions, which seems to be quite nice.

If some other key-value store would be much better, feel free to recommend that too. I'm storing smallish binary blobs, and keys probably will be hashes of the data, or some other unique id.

+1  A: 

If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)

Brian Agnew
Just a warning to this: The databases themselves will *not* be portable between the versions. If you go down this route you'll need a migration strategy for data itself if you find yourself wanting to swap implementations. For this reason, if portability in the data is important, you're better off going with Berkeley DB and the Java API over the Java Edition.
Shaun
A: 

I faced the same problem and decided to go with the Java edition, mainly because of its portability(I need something that would ran even on mobile devices). There are also the Direct Persistence Layer (DPL) API and the fact that the whole db is a single jar makes its deployment fairly simple.

The recent version 4 brought in High availability and performance improvements. There is also the fact that long running java applications can achieve such an optimization, that they would surpass native C applications performance in some scenarios.

It's a natural fit for any Java application - desktop or web.

Bozhidar Batsov
+2  A: 

I while ago I was having the same question, after doing some benchmarks I found that hash mode in the native edition is much faster and storage efficient than anything the java edition has to offer, so I decided to go with the native implementation.

I suggest you do your own benchmarks for the storage capacities you expect and decide if the Java edition is fast enough.

if it is, or if performance is not a big issue for you (it's critical for me), just go with the Java edition. otherwise go with the native one (assuming you see the same performance boost for your own use case).

btw: my benchmark was test the speed of querying random keys out of 20,000,000 records, where the key is a string and the value is an int (4 bytes). I saw that inserts (populating the benchmark) was much faster with the native version, and queries was twice as fast.

(This is not due to Java shortcoming but because the Java version is not of the same version as the native version - 4.0 vs 4.8 IIRC).

Omry