tags:

views:

142

answers:

1

Hi,

I have a RDF file thats used to track item revisions. Using this data I can traceback the changes made to an item through its lifetime. Once a specific has changed the corresponding data is placed as a new revision. Have a look..

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt; .
@prefix mymeta: <http://www.mymeta.com/meta/&gt; .
@prefix dc: <http://purl.org/dc/elements/1.1/&gt; .

<urn:ITEMID:12345> rdf:type mymeta:item .
<urn:ITEMID:12345> mymeta:itemchange <urn:ITEMID:12345:REV-1> .
<urn:ITEMID:12345:REV-1> dc:title "Product original name"@en .
<urn:ITEMID:12345:REV-1> dc:issued "2006-12-01"@en .
<urn:ITEMID:12345:REV-1> dc:format "4 x 6 x 1 in"@en .
<urn:ITEMID:12345:REV-1> dc:extent "200"@en .

<urn:ITEMID:12345> rdf:type mymeta:item .
<urn:ITEMID:12345> mymeta:itemchange <urn:ITEMID:12345:REV-2> .
<urn:ITEMID:12345:REV-2> dc:title "Improved Product Name"@en .
<urn:ITEMID:12345:REV-2> dc:issued "2007-06-01"@en .

According to this data, there was an item revision on "2007-06-01" where only the item name was changed to "Improved Product Name". As you can see, "dc:format" and "dc:extent" are missing from the latest data revision. This is on purpose to avoid millions of duplicate records!

I can write a SPARQL query that shows me the latest product revision information (REV-2: dc:title and dc:issued), but its missing "dc:format" and "dc:extent" which I want carried over from the last revision (REV-1).

How can I write a SPARQL query to do this? Any help much appreciated!

+1  A: 

Not sure you can do this in one query. I'll think more on it if I can, but the following two queries might get you started in the right direction:

1) Find the changes that don't have a format

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX mymeta: <http://www.mymeta.com/meta/&gt;
PREFIX dc: <http://purl.org/dc/elements/1.1/&gt;

DESCRIBE ?change
WHERE 
{
    ?item a mymeta:item;
             mymeta:itemchange ?change.
    ?change ?p ?o.
    OPTIONAL 
    {
        ?change dc:format ?format .
    }
    FILTER (!bound(?format)) 
}

2) I think this will find the oldest change that does have a format

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX mymeta: <http://www.mymeta.com/meta/&gt;
PREFIX dc: <http://purl.org/dc/elements/1.1/&gt;

SELECT DISTINCT ?format
WHERE {
    ?item a mymeta:item;
             mymeta:itemchange ?change.
    ?change  dc:format ?format;
                  dc:issued ?issued.
    OPTIONAL {
        ?moreRecentItem a mymeta:item;
                ?moreRecentItem dc:issued ?moreRecentIssued.
        FILTER (?moreRecentIssued > ?issued)}
    FILTER (?bound (?moreRecentIssued))
}

With some more work it should be possible to limit the ?format of (2) to be from those changes with an issue date before the issue data of a result from (1). So for each row from (1) you'd execute (2) to find the format value to use. You might have better results though if you used a rule-based reasoning engine rather than SPARQL. I'd recommend EulerSharp or Pellet.

Bill Barnhill