ansaurus

Question

SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows

Answer 1

A:

Well, you could do with some definition in the database of which are the required elements. So I'll create one:

CREATE TABLE required(name varchar(255) primary key);
INSERT INTO required VALUES('foo'), ('bar');

(this could be a temporary table or just an inline union of constants if it's dynamic)

Now the set of rows we expect to find in b is given by:

SELECT a.a_id, required.name FROM a CROSS JOIN required;

So we outer join this set against b to determine what's present and what's not:

SELECT a.a_id, required.name, b.b_id
FROM a
     CROSS JOIN required
     LEFT JOIN b ON b.a_id = a.a_id AND b.name = required.name;

or alternatively:

SELECT a.a_id, required.name
FROM a CROSS JOIN required
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.a_id AND b.name = required.name);

Assuming there's an index (and it seems likely from your description to be a uniqueness constraint) on b(a_id,name) that should work nicely. To some extent or another, it will scan a and cross check against b using the index.

araqnid 2010-05-16 20:55:39

the search flags 'foo', 'bar' can change all the time, I'm reluctant to use temporary tables every time I do this query.otherwise I'll investigate the performance of this tomorrow. thanks

milianw 2010-05-16 22:32:25

in postgres you can specify (values('foo'),('bar')) in the FROM clause directly. More portably, you can write (select 'foo' union all select 'bar') etc. which is just more verbose but does the same thing.

araqnid 2010-05-17 00:04:29

Answer 2

+1 A:

This should work with any SQL standard RDBMS:

SELECT 
   a.a_id, 
   Foo.b_id as Foo_Id,
   Bar.b_id as Bar_Id
FROM a
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'foo') as Foo ON
   a.a_id = Foo.a_id
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'bar') as Bar ON
   a.a_id = Bar.a_id
WHERE
   Foo.a_id IS NULL
   OR Bar.a_id IS NULL

Mark Brackett 2010-05-16 20:59:30

yeah, that should work and the query could be generated depending on the searches, but I feel this might be very slow as it requires one join for each searched name, see my answer below for maybe a better option.

milianw 2010-05-16 22:33:14

Answer 3

A:

I got a nice tip in #sql on freenode by Ari-Ugwu and Xgc: Using the CrossTab pattern:

SELECT a.a_id, SUM(name = "foo") as hasFoo, SUM(name = "bar") as hasBar, ...
  FROM a
  LEFT JOIN b USING (a_id)
  GROUP BY a.a_id
  HAVING hasFoo < 1 OR hasFoo IS NULL OR hasBar < 1 OR hasBar IS NULL...;

milianw 2010-05-16 22:36:39

FYI: Not wise to rely on column aliases for use in GROUP BY and HAVING clauses - not all databases support it, most only allow column alias referencing in the ORDER BY.

OMG Ponies 2010-05-16 22:49:32

thanks, will try it with sqlite + postgres tomorrow

milianw 2010-05-16 23:15:09

yes indeed, postgres does not support aliases in HAVING *sigh*. SQLite works though...What's worse though:- postgres requires explicit casts in the SUM- postgres needs _all_ columns to be in either a GROUP BY or an aggregate functionessentially this makes the query horribly complex and I bet also inefficient. I think I'll evaluate what's missing in code after all...

milianw 2010-05-17 14:39:09

Answer 4

A:

turns out that none of these are any faster than simply doing that stuff in the program itself... and the latter is much easier to do, hence I opted for that after all.

milianw 2010-05-17 16:36:26

ansaurus

tags:

views:

answers:

SQL: find entries in 1:n relation that don't comply with condition spanning multiple rows

related questions