views:

93

answers:

2

I am trying to come out with a good design for the storage of a data model. The language is python, but I guess this is fairly agnostic.

At the moment I envision three possible strategies:

Object database

The datamodel is a network of objects. When I create them, I specify them as descendant from a persistence object. Example:

class Tyres(PersistentObject):
    def __init__(self,brand):
        self._brand = brand

class Car(PersistentObject):
    def __init__(self,model):
        self._model = model
        self._tyres = None
    def addTyres(self,tyres):
        self._tyres = tyres
    def model(self):
        return model

The client code is not aware of the persistence, it manipulates the object as it was in memory, and the persistence object takes care of everything without the client code knowing. Retrieval can be done via a keyed lookup of a database object. This is the approach that Zope Object Database (among many others) use. Advantages are lazy retrieval and changes are operated only in the objects that are changed, without retrieving the ones that are untouched.

Shelving objects

The data model given above is represented in memory, but a database is then used to push or pull data as monolitic entities. for example:

 car = Car("BMW")
 tyres = Tyres("Bridgestone")
 car.setTyres(tyres)
 db.store(car)

This is what a pickle-based solution does. It is, in some sense, similar to the previous solution, with the only difference that you store the object as a single bundle and retrieve it again as a single bundle.

The Facade

A single database class with convenience methods. Client code never handles objects, only ids. Example

class Database:
     def __init__(self):
         # setup connection

     def createCar(self, model):
         # creates the car, returns a numeric key car_id

     def createTyresForCar(self, car_id, brand):
         # creates the tyres for car_id, returns a numeric id tyres_id

     def getCarModel(self, car_id):
         # returns the car model from the car identifier
     def getTyresBrand(self, car_id, tyre_id):
         # returns the tyre brand for tyres_id in car_id.
         # if tyres_id is not in car_id, raises an error.
         # apparently redundant but it's not guaranteed that
         # tyres_id identifies uniquely the tyres in the database.

This solution is rather controversial. The database class can have a lot of responsibilities, but I kind have the feeling that this is the philosophy used in SOAP: you don't get to manipulate an object directly, you perform inquires for object properties to a remote server. In absence of SQL, this would likely to be the interface to a relational database: db.createTable(), db.insert(), db.select(). SQL simplifies this to obtain a very simple db interface, db.query(sql_string) at the price of a language (SQL) parsing and execution. You still get to operate on the subparts of the data model you are interested in, without touching the others.

I would like to ask your opinion about the three designs, and in particular the third. When is it a good design, if ever ?

The inverted logic

This is something I've seen on MediaWiki code. Instead of having something like

 db.store(obj)

they have

 obj.storeOn(db)

Edit : The example datamodel I show is a bit simple. My real aim is to create a graph based datamodel (if anyone want to participate to the project I would be honored). What worries me of the third solution strongly encapsulate the written datamodel (as opposed to the in-memory one) and masks the backend, but it risk to blow up as there's only one central class with all the methods exposed. I must be honest, I don't like the third case, but I thought about it as a possible solution, so I wanted to put it on the dish of the question. There could be good in it.

Edit 2 : added the inverted logic entry

A: 

It's hard to say, because your example is obviously contrived.

The decision needs to be made based on how often your data model will change. IMHO cars aren't often gathering new parts; so I would go with a static model in the database of all the items you wish to model, and then a table linking all those together, but it may be wrong for what you are actually doing.

I'd suggest you should talk to us about the actual data you need to model.

Noon Silk
The actual data model I am implementing is a graph data model. The current graph based datamodels do not fit my needs, so I am implementing one myself.
Stefano Borini
What I am undecided of is if I should let the client code manipulate objects directly or through the database interface by means of identifiers.
Stefano Borini
Well, in general it's better to have the class manipulate itself based on outside data. But still, it isn't exactly clear to me what you are doing, so I can't really say anything useful. Maybe someone else can :)
Noon Silk
+1  A: 

The first design is most compatible with Domain-Driven Design. Having the implementation persistence be fully private to an object means you can use the object without regard to its relational representation. It can be helpful for an object only to expose methods that relate to its domain-specific behavior, not low-level CRUD operations. The high-level methods are the only API contract you want to offer to consumers of that object (i.e. you don't want just anyone to be able to delete the car). You can implement complex data relationships and only code them in one place.

The second design can be used with the Visitor pattern. A car object knows what parts of it need to be persisted, but it doesn't have a connection to the database. So you pass the car object to a database connection instance. Internally the db knows how to call an object it's given. Presumably the car implements some "db-callable" interface.

The third design is helpful for implementing an Adapter pattern. Every database brand's API is different, and every flavor of SQL is slightly different. If you have a generic API to plain database operations, you can encapsulate those differences, and swap out a different implementation that knows how to talk to the respective brand of database.

Bill Karwin
my worry is that the third will eventually blow beyond manageability.
Stefano Borini
I wouldn't use the third strategy by itself. Use the adapter to make SQL generation easier, but then use that plan in combination with one of the first two.
Bill Karwin