





I have a RDF file thats used to track item revisions. Using this data I can traceback the changes made to an item through its lifetime. Once a specific has changed the corresponding data is placed as a new revision. Have a look..

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt; .
@prefix mymeta: <http://www.mymeta.com/meta/&gt; .
@prefix dc: <http://purl.org/dc/elements/1.1/&gt; .

<urn:ITEMID:12345> rdf:type mymeta:item .
<urn:ITEMID:12345> mymeta:itemchange <urn:ITEMID:12345:REV-1> .
<urn:ITEMID:12345:REV-1> dc:title "Product original name"@en .
<urn:ITEMID:12345:REV-1> dc:issued "2006-12-01"@en .
<urn:ITEMID:12345:REV-1> dc:format "4 x 6 x 1 in"@en .
<urn:ITEMID:12345:REV-1> dc:extent "200"@en .

<urn:ITEMID:12345> rdf:type mymeta:item .
<urn:ITEMID:12345> mymeta:itemchange <urn:ITEMID:12345:REV-2> .
<urn:ITEMID:12345:REV-2> dc:title "Improved Product Name"@en .
<urn:ITEMID:12345:REV-2> dc:issued "2007-06-01"@en .

According to this data, there was an item revision on "2007-06-01" where only the item name was changed to "Improved Product Name". As you can see, "dc:format" and "dc:extent" are missing from the latest data revision. This is on purpose to avoid millions of duplicate records!

I can write a SPARQL query that shows me the latest product revision information (REV-2: dc:title and dc:issued), but its missing "dc:format" and "dc:extent" which I want carried over from the last revision (REV-1).

How can I write a SPARQL query to do this? Any help much appreciated!

+1  A: 

Not sure you can do this in one query. I'll think more on it if I can, but the following two queries might get you started in the right direction:

1) Find the changes that don't have a format

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX mymeta: <http://www.mymeta.com/meta/&gt;
PREFIX dc: <http://purl.org/dc/elements/1.1/&gt;

DESCRIBE ?change
    ?item a mymeta:item;
             mymeta:itemchange ?change.
    ?change ?p ?o.
        ?change dc:format ?format .
    FILTER (!bound(?format)) 

2) I think this will find the oldest change that does have a format

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX mymeta: <http://www.mymeta.com/meta/&gt;
PREFIX dc: <http://purl.org/dc/elements/1.1/&gt;

    ?item a mymeta:item;
             mymeta:itemchange ?change.
    ?change  dc:format ?format;
                  dc:issued ?issued.
        ?moreRecentItem a mymeta:item;
                ?moreRecentItem dc:issued ?moreRecentIssued.
        FILTER (?moreRecentIssued > ?issued)}
    FILTER (?bound (?moreRecentIssued))

With some more work it should be possible to limit the ?format of (2) to be from those changes with an issue date before the issue data of a result from (1). So for each row from (1) you'd execute (2) to find the format value to use. You might have better results though if you used a rule-based reasoning engine rather than SPARQL. I'd recommend EulerSharp or Pellet.

Bill Barnhill