views:

103

answers:

6

I have two tables tabData and tabDataDetail. I want all idData(PK) from Parent-Table(tabData) that have only rows in Child-Table(tabDataDetail, FK is fiData) with:

  • fiActionCode=11 alone or
  • fiactionCode=11 and fiActionCode=34

Any other combination is invalid. How to get them?

What i've tried without success(slow and gives me also rows that have only fiActioncode 34) :

alt text

Thanks for your Time.


EDIT: Thanks to all for their answers. Now i unfortunately have not enough time to check which one is best or works at all. I marked the first working one as answer.

EDIT2: i think that the marked answer is really the most efficient and compact solution.

EDIT3: Codesleuth's answer is interesting because it returns only rows than have only a single fiActionCode=11. Difficult to see, because that its only true for 20 tabDataDetail-rows ot of 41524189 total-rows that have two. Anyway that was not 100% what i've asked or rather what i was looking for.

+4  A: 
Select ...
From tabData As T1
Where Exists    (
                Select 1
                From tabDataDetail As TDD1
                Where TDD1.fiData = T1.idData
                    And TDD1.fiactionCode = 11
                )
    And Not Exists    (
                      Select 1
                      From tabDataDetail As TDD1
                      Where TDD1.fiData = T1.idData
                          And TDD1.fiactionCode Not In(11,34)
                    )

To expand on my logic, the first check (a correction) is to ensure that a row with fiActionCode = 11 exists. The second check works by first defining the set of rows we do not want. We do not want anything that is something other than fiActionCode = 11 or 34. Because that is the set of items we do not want, we search for anything that does not exist in that set.

Thomas
Thanks. But that gives me also the rows that have only 34 as fiActionCode in Childtable. These should be excluded.
Tim Schmelter
@Tim - Fixed. Just need an additional check to ensure that fiActionCode = 11 exists.
Thomas
PK is idData and tablename is tabData but apart from that it seems to work(i don't knwo why). I get 400k rows so its difficult to check.
Tim Schmelter
@Tim - A simple verification check would be to determine if any resulting row has a tabDataDetail row with an fiActionCode value equal to something other than 34 (we know it must have a row with a value = 11). Just adding an inner join to tabDataDetail would make a visual inspection easier.
Thomas
@Thomas: super efficient, nice one!
Codesleuth
@Thomas: could you please just change "Select ... From idData As T1" to "Select idData From tabData AS T1"? That could be confusing for others. Thanks.
Tim Schmelter
+1  A: 

Edit : Apols - I see what you mean with child rows. This isn't particular efficient. Thanks also to Lieven for the data.

SELECT idData FROM
tabData td
WHERE EXISTS 
(
    SELECT 1 
        FROM tabDataDetail tdd 
        WHERE tdd.fiData = td.idData AND fiActionCode = 11
 )
AND NOT EXISTS
(
    SELECT 1 
        FROM tabDataDetail tdd 
        WHERE tdd.fiData = td.idData AND fiActionCode <> 11
 )
UNION
SELECT idData 
    FROM tabData td
    WHERE EXISTS 
    (
        SELECT 1 
            FROM tabDataDetail tdd 
            WHERE tdd.fiData = td.idData AND fiActionCode = 11
     )
    AND EXISTS
    (
        SELECT 1 
            FROM tabDataDetail tdd 
            WHERE tdd.fiData = td.idData AND fiActionCode = 34
     )
AND NOT EXISTS
(
    SELECT 1 
        FROM tabDataDetail tdd 
        WHERE tdd.fiData = td.idData AND fiActionCode NOT IN (11, 34)
 )
nonnb
That works too.
Tim Schmelter
+4  A: 

Reasoning

  1. LEFT OUTER JOIN excludes all idData's that have an id different from 11 or 34
  2. HAVING excludes all idData's that only have a 34
  3. Remaining records (should) satisfy all constraints

Test data

DECLARE @tabData TABLE (idData INTEGER)
DECLARE @tabDataDetail TABLE (fiData INTEGER, fiActionCode INTEGER)

INSERT INTO @tabData VALUES (1)
INSERT INTO @tabData VALUES (2)
INSERT INTO @tabData VALUES (3)
INSERT INTO @tabData VALUES (4)
INSERT INTO @tabData VALUES (5)

/* Only idData 1 & 2 should be returned */
INSERT INTO @tabDataDetail VALUES (1, 11)
INSERT INTO @tabDataDetail VALUES (2, 11)
INSERT INTO @tabDataDetail VALUES (2, 34)
INSERT INTO @tabDataDetail VALUES (3, 99)
INSERT INTO @tabDataDetail VALUES (4, 11)
INSERT INTO @tabDataDetail VALUES (4, 99)
INSERT INTO @tabDataDetail VALUES (5, 34)

Query

SELECT  *
FROM    @tabData d
        INNER JOIN @tabDataDetail dd ON dd.fiData = d.idData
        INNER JOIN (
          SELECT  idData
          FROM    @tabData d
                  INNER JOIN @tabDataDetail dd ON dd.fiData = d.idData
                  LEFT OUTER JOIN (
                    SELECT  fiData
                    FROM    @tabDataDetail
                    WHERE   fiActionCode NOT IN (11, 34)
                  ) exclude ON exclude.fiData = d.idData
          WHERE   exclude.fiData IS NULL                
          GROUP BY
                  idData
          HAVING  MIN(fiActionCode) = 11        
        ) include ON include.idData = d.idData
Lieven
Thanks, but i get several 'The multi-part identifier "d.idData" could not be bound.' and 'Ambiguous column name'
Tim Schmelter
Have you removed the @? The query runs without problems on my system.
Lieven
Works too(somewhat slow). Thanks
Tim Schmelter
@Tim, proper indexing might go a long way to speed it up, but it will be hard (if not impossible) to beat Thomas' `NOT EXISTS` solution.
Lieven
+1  A: 

Edited my answer based on clarification given in comments on other answers.

select td.idData
 from tabData td
  left join tabDataDetail tdd
   on td.idData = tdd.fiData
    and tdd.fiActionCode = 11
  left join tabDataDetail tdd2
   on td.idData = tdd2.fiData
    and tdd2.fiActionCode = 34
  left join tabDataDetail tdd3
   on td.idData = tdd3.fiData
    and tdd3.fiActionCode not in (11,34)
 where (tdd.fiData is not null
  or (tdd.fiData is not null and tdd2.fiData is not null))
  and tdd3.fiData is null
 group by td.idData
Joe Stefanelli
Also correct result. This method("null-check") is new to me.
Tim Schmelter
+1  A: 

Thanks @Lieven for the data code to test this:

DECLARE @tabData TABLE (idData INTEGER)
DECLARE @tabDataDetail TABLE (idDataDetail int IDENTITY(1,1),
    fiData INTEGER, fiActionCode INTEGER)

INSERT INTO @tabData VALUES (1)
INSERT INTO @tabData VALUES (2)
INSERT INTO @tabData VALUES (3)
INSERT INTO @tabData VALUES (4)
INSERT INTO @tabData VALUES (5)

/* Only idData 1 & 2 should be returned */
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (1, 11)
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (2, 11)
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (2, 34)
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (3, 99)
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (4, 11)
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (4, 99)
INSERT INTO @tabDataDetail (fiData,fiActionCode) VALUES (5, 34)

Query:

SELECT  td.idData
FROM    @tabData td
        INNER JOIN @tabDataDetail tdd ON td.idData = tdd.fiData
WHERE   tdd.fiActionCode = 11 -- check 11 exists
        AND NOT EXISTS ( SELECT * FROM @tabDataDetail WHERE fiData = td.idData
                          AND idDataDetail <> tdd.idDataDetail )
            -- ensures *only* 11 exists (0 results from subquery)
UNION
SELECT  td.idData
FROM    @tabData td
        INNER JOIN @tabDataDetail tdd1 ON td.idData = tdd1.fiData
        INNER JOIN @tabDataDetail tdd2 ON td.idData = tdd2.fiData
WHERE   tdd1.fiActionCode = 11 -- check 11 exists
        AND tdd2.fiActionCode = 34 -- check 34 exists

Returns:

idData
-----------
1
2

(2 row(s) affected)

With only 1 subquery here (and it being a COUNT instead of a very-slow NOT EXISTS) this creates a very neat execution plan which should help if you're having problems with speed.

Codesleuth
@CodeSleuth, don't mention it. The cost according to performance analyzer is 24% for my solution, 32% for your's. I'm interested to know in a real world scenario if that holds.
Lieven
I haven't really looked at the question or this answer in any detail but as a general point `NOT EXISTS` is more efficient than `COUNT(*)` in SQL Server as it does an anti semi join. It just has to check that a matching row doesn't exist. Not count all the ones that match.
Martin Smith
@Lieven: cost of what? Just the subquery? Sorry I seem to have gotten `NOT IN` mixed up in my head with `NOT EXISTS` - I'm wrong, the performance is actually better than the `COUNT` in my answer. I'll edit it in. Cheers!
Codesleuth
@Lieven, @Martin Smith: fixed, sorry about that again :(
Codesleuth
The query returns 212040 tabData rows but the correct result should be 212050. I yet dont know why.
Tim Schmelter
@Tim: which ones is it missing?
Codesleuth
Interesting. The 10 rows are having 2 detail-rows each with two fiActionCode=11 (what should be impossible in my BL). Hence i must admit that your result is most precise.
Tim Schmelter
It was not 100% clear from question if was looking for fiActionCode=11 only in terms of of no other fiActionCode or only in terms of single. I need first alternative. But was interesting to see that it is possible to have more than one 11. Thanks.
Tim Schmelter
@Tim: cool, I had assumed you had a constraint in place for that, from what it seemed in your question. I'm glad my answer helped you a little further :)
Codesleuth
+1  A: 

This does it with one pass through the data I think.

It depends on the data distribution whether or not that would be preferable to doing 2 separate lookups.

WITH matches AS
(
SELECT fiData
FROM tabDataDetail 
GROUP BY fiData
HAVING COUNT(CASE WHEN fiactionCode = 11 THEN 1 END) > 0
AND COUNT(CASE WHEN fiactionCode NOT IN (11,34) THEN 1 END) = 0
)
SELECT ...
FROM idData i
JOIN matches m
ON  m.fiData = i.idData
Martin Smith
This works too (when changing 'SELECT ... FROM idData i' to 'SELECT idData FROM tabData i'). Thanks.
Tim Schmelter