views:

60

answers:

4

I'm trying to optimize SQL queries in Akonadi and came across the following problem that is apparently not easy to solve with SQL, at least for me:

Assume the following table structure (should work in SQLite, PostgreSQL, MySQL):

CREATE TABLE a (
  a_id INT PRIMARY KEY
);

INSERT INTO a (a_id) VALUES (1), (2), (3), (4);

CREATE TABLE b (
  b_id INT PRIMARY KEY,
  a_id INT,
  name VARCHAR(255) NOT NULL
);

INSERT INTO b (b_id, a_id, name)
       VALUES (1, 1, 'foo'), (2, 1, 'bar'), (3, 1, 'asdf'),
              (4, 2, 'foo'), (5, 2, 'bar'), (6, 3, 'foo');

Now my problem is to find entries in a that are missing name entries in table b. E.g. I need to make sure each entry in a has at least the name entries "foo" and "bar" in table b. Hence the query should return something similar to:

a_id = 3 is missing name "bar"
a_id = 4 is missing name "foo" and "bar"

Since both tables are potentially huge in Akonadi, performance is of utmost importance.

One solution in MySQL would be:

SELECT a.a_id,
       CONCAT('|', GROUP_CONCAT(name ORDER BY NAME ASC SEPARATOR '|'), '|') as names
  FROM a
  LEFT JOIN b USING( a_id )
  GROUP BY a.a_id
  HAVING names IS NULL OR names NOT LIKE '%|bar|foo|%';

I have yet to measure the performance tomorrow, but severly doubt it's any fast for tens of thousand of entries in a and thrice as many in b. Furthermore we want to support SQLite and PostgreSQL where to my knowledge the GROUP_CONCAT function is not available.

Thanks, good night.

A: 

Well, you could do with some definition in the database of which are the required elements. So I'll create one:

CREATE TABLE required(name varchar(255) primary key);
INSERT INTO required VALUES('foo'), ('bar');

(this could be a temporary table or just an inline union of constants if it's dynamic)

Now the set of rows we expect to find in b is given by:

SELECT a.a_id, required.name FROM a CROSS JOIN required;

So we outer join this set against b to determine what's present and what's not:

SELECT a.a_id, required.name, b.b_id
FROM a
     CROSS JOIN required
     LEFT JOIN b ON b.a_id = a.a_id AND b.name = required.name;

or alternatively:

SELECT a.a_id, required.name
FROM a CROSS JOIN required
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.a_id = a.a_id AND b.name = required.name);

Assuming there's an index (and it seems likely from your description to be a uniqueness constraint) on b(a_id,name) that should work nicely. To some extent or another, it will scan a and cross check against b using the index.

araqnid
the search flags 'foo', 'bar' can change all the time, I'm reluctant to use temporary tables every time I do this query.otherwise I'll investigate the performance of this tomorrow. thanks
milianw
in postgres you can specify (values('foo'),('bar')) in the FROM clause directly. More portably, you can write (select 'foo' union all select 'bar') etc. which is just more verbose but does the same thing.
araqnid
+1  A: 

This should work with any SQL standard RDBMS:

SELECT 
   a.a_id, 
   Foo.b_id as Foo_Id,
   Bar.b_id as Bar_Id
FROM a
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'foo') as Foo ON
   a.a_id = Foo.a_id
LEFT OUTER JOIN (SELECT a_id, b_id FROM b WHERE name = 'bar') as Bar ON
   a.a_id = Bar.a_id
WHERE
   Foo.a_id IS NULL
   OR Bar.a_id IS NULL
Mark Brackett
yeah, that should work and the query could be generated depending on the searches, but I feel this might be very slow as it requires one join for each searched name, see my answer below for maybe a better option.
milianw
A: 

I got a nice tip in #sql on freenode by Ari-Ugwu and Xgc: Using the CrossTab pattern:

SELECT a.a_id, SUM(name = "foo") as hasFoo, SUM(name = "bar") as hasBar, ...
  FROM a
  LEFT JOIN b USING (a_id)
  GROUP BY a.a_id
  HAVING hasFoo < 1 OR hasFoo IS NULL OR hasBar < 1 OR hasBar IS NULL...;
milianw
FYI: Not wise to rely on column aliases for use in GROUP BY and HAVING clauses - not all databases support it, most only allow column alias referencing in the ORDER BY.
OMG Ponies
thanks, will try it with sqlite + postgres tomorrow
milianw
yes indeed, postgres does not support aliases in HAVING *sigh*. SQLite works though...What's worse though:- postgres requires explicit casts in the SUM- postgres needs _all_ columns to be in either a GROUP BY or an aggregate functionessentially this makes the query horribly complex and I bet also inefficient. I think I'll evaluate what's missing in code after all...
milianw
A: 

turns out that none of these are any faster than simply doing that stuff in the program itself... and the latter is much easier to do, hence I opted for that after all.

milianw