views:

62

answers:

3

Did you ever have the following situation: you need to store information, but a part of this information is well modeled with one type of database (in a very loose sense), and another part is well modeled with another type. Examples:

  • a set of files and additional information about each of them stored in a relational SQL database.
  • an OODB together with a triplestore.
  • two previously completely unrelated key/value data storage which must be integrated, but kept separated.

What do you think it's the best way to deal with this kind of situation? keep the two types of data separated, and write a software layer that keep them synchronized ? use only one kind of database, adapting one kind of data into the other (e.g. storing the file into the relational db as a blob, or storing the relational part in a hacked up file-based database on the disk)?

+1  A: 

I don't think "merging" the two worlds would be a good thing (performance, manteinance and so on). First one is good for me, keep them separated and separate them from the business logic with a layer. Working with loosely coulped layer has many benefits. You could achieve this via design patters or by working with interfaces/abstract classes.

m.bagattini
+1  A: 

This kind of problem is known as federated database systems. I'd recommend reading the article on federated databases on wikipedia.

It's not an easy situation and the solutions to this problem depend a lot on how tight the data in your different "databases" is coupled/related and also on how similar the schema of the different "databases" is.

MicSim
+1  A: 

You describe a problem solved by Virtual Database Engines (also referred to as Federated DBMS Engines).

I suspect the ideal situation for you is a Conceptual Layer that sits atop disparate logical sources which could be any combination of: relational dbms engines (behind ERP, CRM, HR, Accounting systems), web services, XML, etc..

Virtuoso (my company's product) handles this by allowing you to attach exernal/remote data sources associated with a myriad data representation formats (as per list above). It then allows you to use an EAV/CR model (e.g. RDF Graph Model) as the basis for a Conceptual Layer that is both concrete and the focus of all subsequent data interaction. This conceptual layer endows each Data item with an HTTP scheme based Identifier; thus, you only need an HTTP aware user agent as you commence exploring the rich conceptual graph that now fronts your disparate logical data sources.

What I describe above is basically what's commonly known today as: HTTP based Linked Data.

Links:

  1. http://virtuoso.openlinksw.com

Kingsley

Kingsley Idehen
I am right now in the process of combining a triplestore (via rdflib, backend mysql) with a legacy mysql db. Thanks for the pointers, I will dig into them.
Stefano Borini