I am trying to come out with a good design for the storage of a data model. The language is python, but I guess this is fairly agnostic.
At the moment I envision three possible strategies:
Object database
The datamodel is a network of objects. When I create them, I specify them as descendant from a persistence object. Example:
class Tyres(PersistentObject):
def __init__(self,brand):
self._brand = brand
class Car(PersistentObject):
def __init__(self,model):
self._model = model
self._tyres = None
def addTyres(self,tyres):
self._tyres = tyres
def model(self):
return model
The client code is not aware of the persistence, it manipulates the object as it was in memory, and the persistence object takes care of everything without the client code knowing. Retrieval can be done via a keyed lookup of a database object. This is the approach that Zope Object Database (among many others) use. Advantages are lazy retrieval and changes are operated only in the objects that are changed, without retrieving the ones that are untouched.
Shelving objects
The data model given above is represented in memory, but a database is then used to push or pull data as monolitic entities. for example:
car = Car("BMW")
tyres = Tyres("Bridgestone")
car.setTyres(tyres)
db.store(car)
This is what a pickle-based solution does. It is, in some sense, similar to the previous solution, with the only difference that you store the object as a single bundle and retrieve it again as a single bundle.
The Facade
A single database class with convenience methods. Client code never handles objects, only ids. Example
class Database:
def __init__(self):
# setup connection
def createCar(self, model):
# creates the car, returns a numeric key car_id
def createTyresForCar(self, car_id, brand):
# creates the tyres for car_id, returns a numeric id tyres_id
def getCarModel(self, car_id):
# returns the car model from the car identifier
def getTyresBrand(self, car_id, tyre_id):
# returns the tyre brand for tyres_id in car_id.
# if tyres_id is not in car_id, raises an error.
# apparently redundant but it's not guaranteed that
# tyres_id identifies uniquely the tyres in the database.
This solution is rather controversial. The database class can have a lot of responsibilities, but I kind have the feeling that this is the philosophy used in SOAP: you don't get to manipulate an object directly, you perform inquires for object properties to a remote server. In absence of SQL, this would likely to be the interface to a relational database: db.createTable()
, db.insert()
, db.select()
. SQL simplifies this to obtain a very simple db interface, db.query(sql_string)
at the price of a language (SQL) parsing and execution. You still get to operate on the subparts of the data model you are interested in, without touching the others.
I would like to ask your opinion about the three designs, and in particular the third. When is it a good design, if ever ?
The inverted logic
This is something I've seen on MediaWiki code. Instead of having something like
db.store(obj)
they have
obj.storeOn(db)
Edit : The example datamodel I show is a bit simple. My real aim is to create a graph based datamodel (if anyone want to participate to the project I would be honored). What worries me of the third solution strongly encapsulate the written datamodel (as opposed to the in-memory one) and masks the backend, but it risk to blow up as there's only one central class with all the methods exposed. I must be honest, I don't like the third case, but I thought about it as a possible solution, so I wanted to put it on the dish of the question. There could be good in it.
Edit 2 : added the inverted logic entry