views:

122

answers:

3

Hi,

Can anyone help me with a SQL query in Apache Derby SQL to get a "simple" count.

Given a table ABC that looks like this...

    
    id  a   b   c
    1   1   1   1
    2   1   1   2
    3   2   1   3
    4   2   1   1
**  5   2   1   2 **
**  6   2   2   1 **
    7   3   1   2
    8   3   1   3
    9   3   1   1

How can I write a query to get a count of how may distinct values of 'a' have both (b=1 and c=2) AND (b=2 and c=1) to get the correct result of 1. (the two rows marked match the criteria and both have a value of a=2, there is only 1 distinct value of a in this table that match the criteria)

The tricky bit is that (b=1 and c=2) AND (b=2 and c=1) are obviously mutually exclusive when applied to a single row. .. so how do I apply that expression across multiple rows of distinct values for a?

These queries are wrong but to illustrate what I'm trying to do...
SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 AND b=2 AND c=1 ...
.. (0) no go as mutually exclusive
SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 OR b=2 AND c=1 ...
.. (3) gets me the wrong result.
SELECT COUNT(a) (CASE WHEN b=1 AND c=10 THEN 1 END) FROM ABC WHERE b=2 AND c=1
.. (0) no go as mutually exclusive

Cheers, Phil.

+5  A: 

I'm assuming that (a,b,c) is unique. One way to do this is to use a self join:

SELECT COUNT(*)
FROM   ABC T1
JOIN   ABC T2
ON     T1.a = T2.a
WHERE  T1.b = 1 AND T1.c = 2
AND    T2.b = 2 AND T2.c = 1

This works conceptually as follows:

  • Find all the rows that satisfy (b,c) = (1,2)
  • Find all the rows that satisfy (b,c) = (2,1)
  • Join the above two sets on when a is the same.
  • Count the number of rows in the joined result.

An alternative way which might be easier to understand is to use a subselect:

SELECT COUNT(*)
FROM   ABC
WHERE  a IN (SELECT a FROM ABC
             WHERE  b = 2
             AND    c = 1)
AND    b = 1
AND    c = 2

Note: If there can be duplicated values of (a,b,c) then instead of SELECT COUNT(*) use SELECT COUNT(DISTINCT T1.a) in the first query, and SELECT COUNT(DISTINCT a) in the second.

These queries are tested in MySQL, not Apache Derby, but I hope they will work there too.

Mark Byers
Thanks Everyone for the answers and especially to you Mark for the prompt and accurate answer. I tried your second suggestion and it worked perfectly.Now because I'm actually trying to write a cross tab report I'm wondering whether I have to have one query for each combination of b and c values or whether I can somehow do it all in one query.Cheers, Phil.
Phil
A: 

The tricky bit is that (b=1 and c=2) AND (b=2 and c=1) are obviously mutually exclusive when applied to a single row

Exactly so. When thinking of the set of records, you want the number of distinct "a" values for rows wherein (b = 1 and c = 2) OR (b = 2 and c = 1). Try this:

SELECT COUNT(DISTINCT a) FROM "abc" WHERE (b=1 OR c=2) OR (b=2 OR c=1)
pilcrow
should be `AND` in the final query inside the parenthesis ..
Gaby
+1  A: 

Mark's second query should indeed be supported by Apache Derby, according to Apache Derby's SQL support page.

SELECT COUNT(DISTINCT a) FROM ABC
 WHERE b = 1 AND c = 2
   AND a IN (SELECT a FROM ABC WHERE b = 2 AND c = 1);

In addition to being easier to read than the self-join version, it should also be faster, since you avoid the overhead of having to do a JOIN.

Steve Simms