views:

182

answers:

3

BerkeleyDB is the database equivalent of a Ruby hashtable or a Python dictionary except that you can store multiple values for a single key.

My question is: If you wanted to store a complex datatype in a storage structure like this, how could you go about it?

In a normal relational table, if you want to represent a Person, you create a table with columns of particular data types:

Person
-id:integer
-name:string
-age:integer
-gender:string

When it's written out like this, you can see how a person might be understood as a set of key/value pairs:

id=1
name="john"; 
age=18; 
gender="male";

Decomposing the person into individual key/value pairs (name="john") is easy.

But in order to use the BerkeleyDB format to represent a Person, you would need some way of recomposing the person from its constituent key/value pairs.

For that, you would need to impose some artificial encapsulating structure to hold a Person together as a unit.

Is there a way to do this?

EDIT: As Robert Harvey's answer indicates, there is an entity persistence feature in the Java edition of BerkeleyDB. Unfortunately because I will be connnecting to BerkeleyDB from a Ruby application using Moneta, I will be using the standard edition which I believe requires me to create a custom solution in the absence of this support.

+1  A: 

Have a look at this documentation for an Annotation Type Entity:

http://www.oracle.com/technology/documentation/berkeley-db/je/java/com/sleepycat/persist/model/Entity.html

Robert Harvey
Excellent link, thanks. Almost exactly what I was looking for. Unfortunately that feature is available in the Java edition but not in the standard edition that I would be using in Ruby: http://www.oracle.com/technology/products/berkeley-db/db/index.html
Hank
+2  A: 

If your datastore is able to do so (and BerkeleyDB does AFAICT) I'd just store a representation of the object attributes keyed with the object Id, without splitting the object attributes in different keys.

E.g. given:

Person
 -id:1
 -name:"john"
 -age:18
 -gender:"male"

I'd store the yaml representation in BerkleyDB with the key person_1:

--- !ruby/object:Person 
attributes: 
  id: 1
  name: john
  age: 18
  gender: male

Instead if you need to store each attribute as a key in the datastore (why?) you should make sure the key for the person record is somewhat linked to its identifying attribute, that's the id for an ActiveRecord.

In this case you'd store these keys in BerkleyDB:

person_1_name="john"; 
person_1_age=18; 
person_1_gender="male";
LucaM
This is close to the right strategy. Be careful about handling objects that "has_a" other objects. You really want to store a reference to the other object most of the time.I did some googling, and it doesn't seem like Ruby has a real object database, which is too bad. Take a look at Perl's KiokuDB or CL's Elephant for examples of how to do this sort of thing right.
jrockway
yep, right. If BerkleyDB was not a strict requirement, I'd look also at Redis: http://code.google.com/p/redis/ as it gives to the developer a lot richer vocabulary of data structures (lists, sets) with a well defined computational complexity (e.g. you know the Big-O complexity of the different operations)
LucaM
+3  A: 

You can always serialize (called marshalling in Ruby) the data as a string and store that instead. The serialization can be done in several ways.

With YAML (advantage: human readable, multiple implementation in different languages):

require 'yaml'; str = person.to_yaml

With Marshalling (Ruby-only, even Ruby version specific):

Marshal.dump(person)

This will only work if class of person is an entity which does not refer to other objects you want not included. For example, references to other persons would need to be taken care of differently.

Rutger Nijlunsing