views:

117

answers:

2

I have two queries:

query 1:

SELECT DISTINCT ?o COUNT(?o)  
WHERE 
{ ?s1 ?somep1 <predicate_one-uri>. ?s1 ?p ?o}

query 2:

SELECT DISTINCT ?o COUNT(?o)  
WHERE 
{?s2 ?somep2 <predicate_two-uri>.?s2 ?p ?o.}

Each query gives me a different result set (as expected). I need to make a union of these two sets, from what I understand the query below should give me the set I want:

SELECT DISTINCT ?o COUNT(?o)  
WHERE 
{
 { ?s1 ?somep1 <predicate_one-uri>.?s1 ?p1 ?o}
  UNION 
 {?s2 ?somep2 <predicate_two-uri>.?s2 ?p2 ?o.}
}

The problem is that some results from query 1 are not in the union set and vice-versa for query 2. The union is not working properly as it does not incorporate all results of query 1 and query 2. Please advise on the proper structure of the sparql query for achieving the desired result set.

Though if I make the following query (simply remove the COUNT function):

SELECT DISTINCT ?o
WHERE 
{
{ ?s1 ?somep1 <predicate_one-uri>.?s1 ?p ?o}
 UNION {?s2 ?somep2 <predicate_two-uri>.?s2 ?p ?o.}
}

I get the appropriate result set. But I also need to have the frequency of the variable ?o.

Thanks in advance!

JP Levac

A: 

Not entirely sure here but have a theory which may be entirely wrong

Your query confuses me slightly as it seems to imply some grouping since in theory at least a SPARQL engine should not let you select both a variable and an aggregate on that variable in the same query without an explicit GROUP BY. So results may depend on what SPARQL engine/triplestore you are using?

If an implicit grouping is the case you may not get as many results as you expect as the grouping will group results from both sides of the union together. For example say query 1 gives you 10 results and query 2 gives you 5 results then the maximum number of results you can get from a union is 15 but may be less as the grouping may combine results from the two sides of the union together. To avoid this then you should use completely different variable names on both sides of the query, for example:

SELECT * WHERE { {?s ?p ?o} UNION {?x ?y ?z}}

Which would give you a results table which had a pattern like the following:

 ?s | ?p | ?o | ?x | ?y | ?z
-----------------------------
  a |  b |  c |    |    |
    |    |    |  a |  b |  c

Not sure if any of that is relevant/useful to you, if you can provide more details about the environment you are executing the query in i.e. Triplestore, SPARQL engine, API/library etc then I/someone else may be able to provide a better answer

RobV
Thanks for your feedback. I am using OpenVirtuoso which I believe uses Jena, allowing me to use the COUNT aggregate function. I have gotten a reference here: http://stackoverflow.com/questions/1223472/sparql-query-and-distinct-count.I understand that the amount of rows returned by a union could be less than the sum of both queries because of the union. My problem is that some values don't even show up in the union query which were present in say query 1.Sorry, I am still new to SPARQL and RDF, I believe that the triple store is in RDF/XML (does this make sense?).Thanks again,JPL
levacjeep
A: 

I think it will work if you remove the DISTINCT, and add GROUP BY ?o to the end of the query.

DISTINCT is really just for removing duplicates. It's not for grouping and counting.

cygri