views:

337

answers:

2

I need to query my local Hibernate managed datastore for persisted objects based on criteria where the relevant data for the WHERE clause is in the Linked Open Data cloud.

Is there a way to read a Hibernate Session as RDF? If so, I can at least use the combined Dataset to decide what objects to retrieve from Hibernate.

Preferably the solution would expose the Hibernate Session as inside Jena as I'm familiar with it. In addition I'll need support for RDFS inferencing and SPARQL for retrieval.

+1  A: 

Since you will be using Jena (and ARQ) to execute your SPARQL query, you could use a custom FileManager to resolve the Hibernate objects/graphs (assuming you want each object to be represented by a graph).

Jena has a short HOWTO on using a FileManager to locate models, and the ARQ RDF dataset tutorial (see the "Describing Datasets" section) may give some hints as to how to do a more customized mapping of graph URIs (and the contents of those graphs as RDF datasets) to your existing Hibernate-managed data.

Another option might be creating a custom SDB layout which maps to your existing Hibernate schema. I don't know how flexible SDB is in this regard.

Phil M
Assuming I don't use SDB, how would the actual Hibernate managed data get into the graphs? Assuming one graph per SessionFactory, I can obviously implement Graph myself, but I'd rather not have to.
Simon Gibbs
A: 

Here's what I've found since posting the question:

There is no existing tool to triplify the Hibernate Session specifically. To implement one myself I need to implement Graph perhaps using GraphBase as a basis, or StageGenerator. The answer to the question is therefore "there isn't one", so I went on to consider how to implement it.

I need to decide whether to triplify objects already in the session (i.e. already accessed by some earlier query), rely on accessing the database or do both. If going to the database I also need to decide whether to load whole objects which will then be attached to the session or use projection to save bringing extra data into the heap at the expense of additional round-trips.

Using Graph is apparently essential for supporting inferencing, though its slower than using ARQs StageGenerator since that can query a set of triple patterns, however this makes it essential to always use SPARQL which seems a bit inflexible.

So far, the optimum solution appears to be:

  • Implement a (probably read-only) Graph - say "HibernateGraph"
  • Have HibernateGraph inspect the Hibernate PersistenceContext object and return triples at the head of an custom iterator.
  • When the iterator expires load pages of data from the database using the Criteria interfaces.
  • For queries with a known predicate URI map the URI to a column and use a tight projection, otherwise load the whole object and iterate over getters, mapping the getter name to a URI.
  • In other cases map using a simple scheme e.g. http://root/url/instances/EntityName/id for each subject etc
  • Create a helper object to allow SPARQL to be performed with a custom StageGenerator.
  • The StageGenerator should wrap the built in StageGenerator.
  • In the custom stage generator pass queries on graphs other than a HibernateGraph up the chain to the built in StageGenerator.
  • Also skip any set of triple patterns for which there is no optimised solution e.g any set of one pattern.
  • Where an optimized query can be achieved, run the appropriate Criteria functions and map the results cell-by-cell to triples as before.

There is another SPI called OpExecutor which may help to push FILTER resolution into the database, therefore improving performance further.

At the moment I've taken this on as a side project, which I may well release as LGPL software.

Simon Gibbs