views:

121

answers:

3

Relational databases are frequently used to store graphs in all their many flavors (trees, directed graphs, undirected graphs, ...).

Why then do none of the major DBMSs (Microsoft, MySql, Oracle, PostgreSQL, SqlLite, just to name a few in alphabetical order) include library support for treating relations as graphs?

Some desirable features, by way of example:

  • Constraint checking (connectedness, acyclicity, planarity, ...)
  • Commonly needed functions (shortest path, minimum spanning tree, transitive closure, max flow/min cut, clique detection, Hamiltonian/Eulerian cycles ...)
  • Auxiliary data structures needed to improve performance for any of the above

Building support for some of these things outside the database is complicated because (among other reasons):

  • It's inherently complicated (libraries help here)
  • Short answers are often supported by lots of data: an external client running a shortest path algorithm would need to either be very "chatty" with the database or would need to retrieve a much-larger-than-needed amount of data; either choice is bad for the network
  • Maintaining integrity when integrity depends on a graph-theoretic constraint requires access to all proposed updates, hence a trigger, and access to existing graph libraries from triggers is complicated in many systems
  • The DBMS storage manager and optimizer are uniquely positioned to address the question of auxiliary data structures, as they do with indexes

This isn't a rhetorical question, I actually want to know if there are interesting technical (or historical) reasons.

A: 

Oracle has support for graph functionality (Oracle Locator/Oracle Spatial) and semantic web functionality.

tuinstoel
Ooh, nice. I have never had a chance to play with Oracle 11. This looks good (if spatially-focused) so far from the docs I am reading. (see http://www.oracle.com/technology/products/spatial/pdf/10gr2_collateral/spatial_twp_ntwrkdatamod_10gr2_0512.pdf among others)
Doug McClean
Well, both PostgreSQL and mySQL can record geometries which could include multi-polygons representing graphs. But neither really supports graphs and operations on them -- for instance neither has the facility to ensure that a geometry stored represents a connected graph.
High Performance Mark
Yeah, and those common connectedness, planarity, acyclicity, etc conditions can take a long time to check without appropriate auxiliary data structures. But it's true that this functionality is often wrapped up in libraries whose API only supports geographic queries.
Doug McClean
A: 

I've worked in a research group, interested among others in delevoping a database for RDF(S) data, which is basically labeled graphs, or triples [subject, predicate, object], which are basically graph edges: [sourceNode, edgeLabel, targetNode].

The question to ask, to appreciate the hardness of the problem: What kind of indices are you going to build for a labeled graph? You have to take advantage of common "properties" (each "predicate" is a property of the subject, with the value of object), and index edges accordingly, so you can quickly find if "there is an edge called 'hasAge' on a Person with value greater than 18".

For illustration, here is a simple approach, which is schema-oblivious (and goes quite to the opposite direction of traditional database research which quite unanimously agrees that schemata are good to have). It completely ignores any schema information (this paper provides useful context). Just store everything in three big tables (s: subject, p: predicate, o: object):

  1. [s, p, o]
  2. [p, o, s]
  3. [o, s, p]

These three suffice to answer any efficiently evaluate any query with (at most) a subject, (at most) a predicate, and (at most) an object (i.e. queries of the form (s, *, *), (*, p, *), (*, *, o), (s, p, *), (s, *, o), (*, p, o), (s, p, o)). Complex queries though consist of many "path expressions" (i.e. you describe data for which you can find certain paths satisfying some criteria), each of which is translated to a self-join on one of these (big!) tables, which is not all that efficient, which is a problem.

There, that's a simple graph database in a pocket. :)

Concluding, this is the field of active research. I'm not up to date with the current state of art, but I've seen products like AllegroGraph and others that claim very good results.

Dimitris Andreou
A: 

Hi

I suspect that your question contains the beginnings of its own answer.

The commonly needed functions you list are not, for general purpose databases, commonly needed at all. Yes they are certainly needed for graph operations, but rarely for, say, customer billing. Of course, a relational database can store graphs in tables but the graph operations are beyond the powers of any version of SQL I've seen.

You write Building support for some of these things outside the database is complicated. True that and it's why we all get paid so much. But it would be just as complicated to build support for those things into the database wouldn't it ?

Regards

Mark

High Performance Mark
Not for customer billing? They sure are, if you have a recursively defined product and/or order schema (based on subparts).It would be just as complicated. But it would be done a few times by tightly quality controlled engineering teams instead of many, many, many times by all of us. Furthermore, putting it in the database has potential for absolutely enormous performance improvements. Short queries with short answers can involve huge amounts of data, and hence bandwidth if done externally. There are many graph algorithms that benefit from secondary data structures (analogous to indices).
Doug McClean
shortest path, minimum spanning tree, transitive closure, max flow/min cut, clique detection, Hamiltonian/Eulerian cycles -- for customer billing !
High Performance Mark

related questions