tags:

views:

514

answers:

4

I like these a lot, and would like to use them for everything. Why would doing RDF triple stores for everyday programming (Entities/Tables), such as Contacts, Customers, Company etc. be a bad idea.

Are there short falls in the technology that I have not come across. I was concerned about retrieving data, but I think this is covered with SPARQL.

+2  A: 

There are no one-size-fits-it-all tools. Triple stores are appropriate and usable today for some kinds of tasks and not for others.

A similar question was asked on semanticoverflow.com and the common answer was the same: "use whatever is appropriate".

Pēteris Caune
how many overflows are there? I think we need some buckets :)
WeNeedAnswers
No One size fits all. I tend to agree with that, although the attitudes of RDBMS is that there is only one way to store data, and to retrieve it efficiently. RDBMS though tend to be inflexible due to the accidental constraints placed upon it from the coding perspective. RDF would never be constrained in the same manor, if used correctly, although the speed issues might be a problem on retrievals.
WeNeedAnswers
+1  A: 

Further to Peteris's answer there are some key differences between how you model data for a Triple Store vs other techniques like OOP, relational databases, XML e.g. rows, classes, properties etc

It very much depends what you want to do whether they are appropriate and whether you can find one with the right performance characteristics for your application.

People have a tendency to characterise triple-stores as being schema-less databases but realistically unless you are using some form of schema/ontology then they aren't particularly useful. If you want to use SPARQL to get stuff out then there needs to be some schema patterns in the store that you can write queries against.

Personally I would still use relational databases for a lot of things and still do, while I'm using RDF and triple stores for an increasing amount of stuff that doesn't mean I'm ready to throw out what works well.

As a final point even if you go with a relational database for the time being there are technologies like DB2RDF which can convert relational databases to RDF so you can stick with a DB for now and then export your database to RDF in the future as desired

RobV
"Throwing out what works" - But isn't that what people are doing anyway, I mean, the increase of the ORM, Domains taking the place of ERD's and Entities. Why not look at the problem of the Impedance mismatch and grab it by the horns and choose option 3. I would always do a schema/domain Model for anything of a certain size. A lot of the RDF stuff though has been done already in such works as the Dublin Core and other well defined Schema's. You could come up with your own though, nothing stopping you. I probably would for a Domain solution.
WeNeedAnswers
i wouldn't characterise ORM as a replacement for relational data models, it seems to be primarily used just to provide a high level abstraction layer for developers which appears to be a more general trend in modern development
RobV
The ORM though gets embedded into the development language, then all your great work on Entities gets lost in the code. The ORM is certainly not modern, been around a long long time. The ORM is used today as a panacea to the Database Object problem not the higher idea of abstraction. I would offer up The triplets for an alternative approach, as you can still use the ORM on top of a triplet db.
WeNeedAnswers
point taken, linq2rdf being an example of an ORM on top of a triplestore
RobV
+1  A: 

Query times tend to be much slower than for conventional DBs, even with simple queries. Also, many RDF stores don't support standard DB features like transactions, crash recovery, ...

Carsten
Does that also go for the ones built on top of an RDBMS?
WeNeedAnswers
Regarding query speed in my experience: Yes. DBs are very good at exploiting well designed DB schemas for queries. Unless a RDF store does a very clever, continuing analysis of the triples and maps them to a clever schema on the DB layer it will never come close. Jena SDB e.g. does some clever caching of strings, but basically puts everything in a few simple tables.I'd expect them to be better at crash recovery and I think I remember some of them supporting transactions.
Carsten
Thanks for your help. Would you use one then for "everyday joe" programming, or only on real cases where they would come into their own?
WeNeedAnswers
Unless you have queries which are much easier in SPARQL than SQL I would not bother. If your main motivation is that you are not sure about what schema to use, have you considered a No-SQL DB? I personally have no experience with them, but they seem to be fashionable at the moment ...
Carsten
+1  A: 

One of the shortcomings we have come across in using RDF triple stores for general programming is that most engines don't support aggregation in queries (min, max, group by).

A checklist we use to decide between RDBMS is the following

RDBMS if

  • static schema
  • very large amount of data
  • no RDF export needed
  • Lucene support needed (easy via Hibernate Search for example)
  • strong data consistency requirements (money involved etc)

RDF if

  • not fixed or dynamic schema
  • small to large amount of data
  • RDF export needed
  • loose data consistency requirements

Refactoring RDFBMS schemas for ongoing projects can be quite an overhead if you don't have the correct tools.

Lucene support is provided by some RDF engines as well, but is not as well documented and supported as in the case of Hibernate Search.

Scalability of RDF engines is also improving steadily, where ideas of the NoSQL side are incorporated into RDF engines, but if you go with the standard engines of Jena and Sesame, this division is still quite valid.

Timo Westkämper
A reasonable checklist, but I'd add two amendments: (i) many RDF stores support Lucene indexes as well, and (ii) scalability of RDF stores is improving steadily, so I'd characterise the division as between "large" and "very large", not between "small" and "lots". Finally, though it's still in development, the next release of SPARQL will include aggregate functions. This in turn will drive the provision of aggregate functions in the RDF stores (noting that some already do). See http://www.w3.org/TR/sparql11-query/#aggregateFunctions.
Ian Dickinson
Thanks for the pointers, I updated the answer
Timo Westkämper