views:

297

answers:

2

Good news! Since version 4.8 BerkeleyDB has c# interface. BerkeleyDB is a quite interesting thing for me due to it's non-SQL nature. I know it's an excellent tool if someone wants to store a lot of key/value pairs. And I know about 'attachable' tables. What i don't know is how to store hierarchical data with BerkeleyDB. Is it suitable for this in general?

What i want to do? I want to store dmoz.org data. Now i have all thous rdfs imported to MySQL db. But i don't need stored procedures or another complex features. I want to use BerkeleyDB as a data store for my online RSS reader. So there is feeds in a category tree(as i said categories i've imported from dmoz. and i have a LOT of them, as well as feeds - millions). And... i forgot about feed items. i want to store them too with BerkleyDB :-).

It's look like i have to implement all relations manually,,, It's ok... But the most important thing i asking about is the speed. Will(Can) my solution with BerkeleyDB be faster then one based on MySQL(or on any RDBMS in general)?

+1  A: 

Hierarchical structures can be stored in key-value stores using a parent or child attribute.

If you want a parent to have 1 or more children, use a parent attribute on each record and have root nodes have a parent of ID 0 or some other meaningful value.

If you want a child to have 1 or more parents, use a child attribute on each record.

If you want nodes may have multiple parents, and children use a separate table to store the relationships.

This way you can traverse the tree by by querying for nodes that have a certain parent or child.

Ben S
So, in short, i have to implement relations manually? How about the speed?
Trickster
Yes, but this is the way with all RDBMS. If you use an index on the relationship column, it's just a single indexed row look-up, so performance should be good so long as you're not traversing the entire hierarchy constantly.
Ben S
Gave you +1. Thank you!
Trickster
+2  A: 

It is suitable for that, but it may be more work than you're willing to put in. BerkeleyDB is a very general key/value store, so all you do is say "for key X, store value Y". Later you can say "give me the value of key X" and it will give you back Y. That's really all it does from a high level. It has very robust features for guaranteeing important reliability properties (called ACID, for Atomicity, Consistency, Isolation, and Durability), and has great performance, but from the programmer's perspective, it's a simple map structure.

So yes, you can store trees, but you'd need to decide on a good representation for them. You can go for integer keys (make sure they're stored in big-endian byte order because BDB uses lexicographic ordering on keys) and simply have a struct as the value containing a list of integers for children. You would still have to write all your traversal algorithms by hand, though. Without knowing what requirements you have for your hierarchical data though, it's hard to give a more concrete suggestion.

Speedwise, for what it does Berkeley DB probably can't get much faster (i.e., you won't find much out there that is faster, especially if you're willing to sacrifice some of the ACID properties). It gives you almost complete control over your interface to the map, so in theory you could probably build a highly optimized structure for your particular use case. However, given the low-level interface, if you're implementing joins, complex filter queries, or any kind of nontrivial query language on top of it, you'll have to write some very speedy code and algorithms to keep up with the big relational databases out there.

If your data can be modeled by XML (eugh, but I know some people like it), there is an existing database built on top of BDB called BDB XML (also by Sleepycat, now part of Oracle). This allows you to store arbitrary XML documents in the database, and to perform fast XPath and XQuery queries on the database. I don't think there's an official .NET API to this yet, but I'm pretty sure I've come across an unofficial .NET binding to it.

In general, unless you have some very particular requirements that existing solutions out there don't allow (this does not appear to be the case with your scenario), I'd advise against rolling your own database (even built on top of BDB) unless you're very skilled with efficient algorithms and code optimization. If you're storing RDF triples, there are dedicated databases for that, and even relational databases aren't particularly unsuited for them. BDB XML is still a viable solution for that, too. It's ultimately your choice, but if I were you I'd choose to work on the more interesting problems without having to deal with low-level database operations (and would thus use a thin layer over existing package for my actual RDF store).

pumpkin
i've updated my question
Trickster
Updated to address that :)
pumpkin
This online feed reader not just a big data store. It'll have strong scientific background(text processing and knowledge extraction). So you think MySQL will be alright in this case?
Trickster