ansaurus

Question

Modeling atomic facts in a relational database

Answer 1

+2 A:

RDF is great for this. It's usually described as a format for metadata; but in fact it's a graph model of 'assertions' on triplets.

The whole 'semantic web' idea is to publish lots of facts on RDF, and search engines would be inference engines that traverse the unified graph to find relationships.

There's also some mechanisms to refer to a triplet, so you can say something about an assertion, like it's origin (who says this?), or when it was asserted (when did he said that?), or how much you beleive it to be true, etc.

As a big example, the whole OpenCyc 'commonsense knowledge base' is queryable in RDF

Javier 2009-05-20 21:21:38

I looked in to RDF, and I still might go that way, but there does not seem to be anything like an emerging consensus on how to do this. Reification seems precisely like what I want but SPARQL does not like it. Named graphs seem a partial solution, but still does not seem quite there.

fgregg 2009-05-20 21:33:42

sure, where there's no consensus you have to make a choice; but not using any standard (even an incomplete standard) means you have to make choices for _everything_. And if some consensus emerges, you might have to adapt; but not so much as if you did it totally different.

Javier 2009-05-20 22:09:27

Maybe it would be easier for you to work directly with a graph model of the data? I wrote a response to the post by Fowler mentioned in the question along this line of thought: http://blog.nawroth.com/2009/03/flexibility-in-data-modeling.html

nawroth 2009-05-20 22:11:37

@Javier, That's a fair point.

fgregg 2009-05-20 22:24:06

@nawroth: RDF is a graph model. the triplet store and the XML are just representations of that. once you internalize them, you're working on graphs

Javier 2009-05-20 22:34:57

@Javier: I should have made myself more clear. There are other ways than RDF to work with a graph model, namely graph databases. Dependeing on what you want to do the API of a graphdb could be easier to work with than RDF. If it's a main concern not to tie the backend to a specific implementation, RDF is the way to go I think.

nawroth 2009-05-20 22:51:41

Answer 2

A:

This feels like it gets very complicated very quickly

You're not kidding. Have a look at the work on ontology and knowledge representation.

Charlie Martin 2009-05-20 21:22:07

Answer 3

+1 A:

I think what you want to use is a "property bag". Instead of modeling each individual type of fact that you want to describe, you want to have a table which will contain an ID, a "key" (in this case, the alleged information (such as "kinship")) and a "value" (in this case, the alleged value (such as "Abraham Lincoln)). Then you want to have a second table which ties your claimants to that table, along with a level of confidence that they have in that information. That table would simply have the ID of the source, the ID of the property, and the confidence that the source has in the information. In that way, you can have a source which has either a lot or a little information; you can also model differing sources having differing levels of confidence in a given attribute; there is also no limitation on how many differing types of information you can store.

It's a pretty standard solution for situations such as yours where you have large amounts of optional information that you want to cross-reference.

McWafflestix 2009-05-20 21:22:42

Answer 4

+3 A:

You should try a Star Schema model, centered around a "Fact" table and several "Dimension" tables. This is a well-explored model, and there are many database optimizations for it.

claim_fact(source_id, person_id, user_id, details_id, weight)

Source_dimension(id, name)

Person_dimension(id, name)

User_dimension(id, name)

details_dimension(id, name NOT NULL, color NULLABLE, kinship NULLABLE, birthday NULLABLE)

Every claim would have a source, person, user, and details. NAME values for details would be values such as "kinship", "birthday".

Keep in mind that this is an OLAP schema (rather than an OLTP structure), and being so it is not fully normalized. The benefits to this outweigh any problems you may come across due to redundancy, as queries to star schemas are highly optimized by DBMSs configured for Data Warehousing.

RECOMMENDED READING: The Data Warehouse Toolkit (Kimball, et al.)

2009-05-20 21:30:52

ansaurus

tags:

views:

answers:

Modeling atomic facts in a relational database

related questions