views:

228

answers:

5

Is there a database out there that gives you the benefit of referential integrity and being able to use a SQL type language for querying, but also lets entities be loosely defined with respect to their data attributes and also the relationships between them?

E.g. take a RBAC type model where you have Permissions, Users, User Groups & Roles. A complex/flexible model could have the following rules:

  • Roles can have one or more permissions and a permission can belong to one or more Roles
  • Users can have one or more permissions and a permission can belong to one or more Users
  • Users Groups can have one or more permissions and a permission can belong to one or more Users Groups
  • Users can have one or more roles and a role can belong to one or more Users
  • User Groups can have one or more roles and a role can belong to one or more User Groups
  • Roles can have one or more roles and a role can belong to one or more Roles

To model the above in an RDBMS would involve the creation of lots of intersection tables. Ideally, all I'd like to define in the database is the entities themselves (User, Role, etc) plus some mandatory attributes. Everything else would then be dynamic (i.e. no DDL required), e.g. I could create a User with a new attribute which wasn't pre-defined. I could also create a relationship between entities that hasn't been predefined, though the database would handle referential integrity like a normal RDBMS.

The above can be achieved to some degree in a RDBMS by creating a table to store entities and another one to store relationships etc, but this overly complicates the SQL needed to perform simple queries and may also have performance implications.

+5  A: 

Most NoSQL databases are built to scale very well. This is done at the cost of consistency, of which referential integrity is part of. So most NoSQL don't support any type of relational constraints.

There's one type of NoSQL database that does support relations. In fact, it's designed especially for relations: the graph database. Graph databases store nodes and explicit relations (edges) between these nodes. Both nodes and edges can contain data in the form of key/value pairs, without being tied to a predefined schema.

Graph databases are optimized for relational queries and nifty graph operations, such as finding the shortest path between two nodes, or finding all nodes within a given distance from the current node. You wouldn't need this in a role/permission scenario, but if you do, it'll be a lot harder to achieve using an RDBMS.

Another option is to make your entire data layer a hybrid, by using a RDBMS to store the relations and a document database to store the actual data. This would complicate your application slightly, but I don't think it's such a bad solution. You'll be using two different technologies, both dealing with the problems they were designed to deal with.

Niels van der Rest
On the contrary, graph databases generally support heavily optimised join-oriented queries. I don't have time to give this question the time it deserves tonight, I'll see what I can do tomorrow. In the meantime, other than that small inaccuracy, this answer is pretty good. +1
Recurse
@Recurse: Thanks, I've removed the inaccurate statement, as I wasn't sure about that in the first place. Looking forward to your answer!
Niels van der Rest
do any graph databases support a sql type lanquage?
SeriousCallersOnly
+1  A: 

There's the Gremlin language, supported by the Neo4j graph database. Regarding your example, have a look at Access control lists the graph database way and here. There's also a web-based tool including a REST API to Neo4j and a Gremlin console, see neo4j/webadmin.

nawroth
+1 for the informative examples.
Niels van der Rest
+4  A: 

Given the requirements you specify in your question, a graph database is probably the sort of thing you are looking for, but there are other options. As @Niels van der Rest said, the two constraints of "no a priori schema" and "referential integrity" are very hard to reconcile. You might be able to find a Topic-Map based database that might do so, but I'm not familiar with specific implementations so I couldn't say for sure.

If you decide you really can't do without referential integrity, I fear you probably are stuck with an RDBMS. There are some tricks you can use that might avoid some of the problems you anticipate, I cover a couple in http://stackoverflow.com/questions/3395606..., which might give you some ideas. Still, for this sort of data-model requiring dynamic, post-priori schema, with meta-schema elements, an RDBMS is always going to be awkward.

If you are willing to forgo referential integrity, then you still have three approaches to consider.

  1. Map/Reduce - in two flavours: distributed record-oriented (think, MongoDB), and column-oriented (think, Cassandra). Scales really really well, but you won't have your SQL-like syntax; joins suck; and matching your architecture to your specific query types is critical. In your case your focus on the entities and their attributes, rather than the relationships between the entities themselves, so I would probably consider a distributed record-oriented store; but only if I expected to need to scale beyond a single node—they do scale really really well.

  2. Document-store - technically in two flavours, but one of them is a distributed record-oriented map/reduce datastore discussed above. The other is an inverted-index (think, Lucene/Solr). Do NOT disregard the power of an inverted-index; they can resolve obscenely complex record predicates amazingly fast. What they can't do is handle well is queries that include correlation or large relational joins. Still, you will be amazed at the incredible flexibility, sufficiently complex record predicates gives you.

  3. Graph-store - come in a few flavours the first is the large-scale, ad-hoc key-value store (think, DBM/TokyoTyrant); the second is the tuple-space (think, Neo4j); the third is the RDF database (think, Sesame/Mulgara). I have a soft-spot for RDF, having helped develop mulgara, so I am not the most objective commenter. Still, if your scalability constraints will permit you to use an RDF-store, I find the inferencing permitted by RDF's denotational semantics (rare amongst noSQL datastore options) invaluable.

Recurse
Regrading referential integrity Neo4j has ACID transactions, and guarantees that there are no dangling relationships. It's impossible to by accident remove a node that still has relationships. And if you choose to go with RDF, there's actually an RDF implementation on top of Neo4j as well. See the [Neo4j components list](http://components.neo4j.org/).
nawroth
A: 

You may want to check out MongoDB it is a document based database and so has a flexible schema. It is awesome and worth the time to see if it would suite your needs.

Matt
+1  A: 

Some NoSQL solutions support security and SQL. One of these is OrientDB. The security system is (quite) well explained here.

Furthermore supports SQL.

Lvca
looks interesting, thanks
SeriousCallersOnly