views:

5610

answers:

12

Given the following:

declare @a table
(
    pkid int,
    value int
)

declare @b table
(
    otherID int,
    value int
)


insert into @a values (1, 1000)
insert into @a values (1, 1001)
insert into @a values (2, 1000)
insert into @a values (2, 1001)
insert into @a values (2, 1002)

insert into @b values (-1, 1000)
insert into @b values (-1, 1001)
insert into @b values (-1, 1002)

How do I query for all the values in @a that completely match up with @b?

{@a.pkid = 1, @b.otherID = -1} would not be returned (only 2 of 3 values match)

{@a.pkid = 2, @b.otherID = -1} would be returned (3 of 3 values match)

Refactoring tables can be an option.

EDIT: I've had success with the answers from James and Tom H.

When I add another case in @b, they fall a little short.

insert into @b values (-2, 1000)

Assuming this should return two additional rows ( {@a.pkid = 1, @b.otherID = -2} and {@a.pkid = 2, @b.otherID = -2}, it doesn't work. However, for my project this is not an issue.

Thanks for the responses!

A: 

Several ways of doing this, but a simple one is to create a union view as

create view qryMyUinion as select * from table1 union all select * from table2

be careful to use union all, not a simple union as that will omit the duplicates

then do this

select count( * ), [field list here] from qryMyUnion group by [field list here] having count( * ) > 1

the Union and Having statements tend to be the most overlooked part of standard SQL, but they can solve a lot of tricky issues that otherwise require procedural code

Cruachan
Can you post some code that shows this?
Austin Salonen
+6  A: 

Probably not the cheapest way to do it:

SELECT a.pkId,b.otherId FROM
 (SELECT a.pkId,CHECKSUM_AGG(DISTINCT a.value) as 'ValueHash' FROM @a a GROUP BY a.pkId) a
 INNER JOIN (SELECT b.otherId,CHECKSUM_AGG(DISTINCT b.value) as 'ValueHash' FROM @b b GROUP BY b.otherId) b
ON a.ValueHash = b.ValueHash

You can see, basically I'm creating a new result set for each representing one value for each Id's set of values in each table and joining only where they match.

James
I must say that your solution is an interesting approach. I'm not sure I would rely on a hash value without using some other method of double checking. But anyway it is very interesting to read so I just wanted to say it's cool.
Cervo
I thought this was pretty slick solution so I had to add my vote.
Quintin Robinson
Thanks. Yea, I'd be kinda leery about relying on just a single hash value if there are large amounts of data. You could gain a lot of confidence without doubling the expense by doing a second hash after appending some fixed value to each of the a and b values. Like CHECKSUM_AGG(a.value + 'aaa')
James
I take that back...doing so would almost double the expense, I think.
James
A: 

If you are trying to return only complete sets of records, you could try this. I would definitely recommend using meaningful aliases, though ...

Cervo is right, we need an additional check to ensure that a is an exact match of b and not a superset of b. This is more of an unwieldy solution at this point, so this would only be reasonable in contexts where analytical functions in the other solutions do not work.

select 
 a.pkid,
 a.value
from
 @a a
where
 a.pkid in
 (
 select
  pkid
 from
  (
  select 
   c.pkid,
   c.otherid,
   count(*) matching_count
  from 
   (
   select 
    a.pkid,
    a.value,
    b.otherid
   from 
    @a a inner join @b b 
    on a.value = b.value
   ) c
  group by 
   c.pkid,
   c.otherid
  ) d
  inner join
  (
  select 
   b.otherid,
   count(*) b_record_count
  from
   @b b
  group by
   b.otherid
  ) e
  on d.otherid = e.otherid
  and d.matching_count = e.b_record_count
  inner join
  (
  select 
   a.pkid match_pkid,
   count(*) a_record_count
  from
   @a a
  group by
   a.pkid
  ) f
  on d.pkid = f.match_pkid
  and d.matching_count = f.a_record_count
 )
Dave DuPlantis
I don't think this works. I took your code and cut and pasted it with the code above and added the line insert into @a values (2, 1003). The problem is you don't check the count in @a as well. So if B matches with everything in A you say it is correct. This ignores A having more entries than b.
Cervo
+1  A: 

Works for your example, and I think it will work for all cases, but I haven't tested it thoroughly:

SELECT
    SQ1.pkid
FROM
    (
     SELECT
      a.pkid, COUNT(*) AS cnt
     FROM
      @a AS a
     GROUP BY
      a.pkid
    ) SQ1
INNER JOIN
    (
     SELECT
      a1.pkid, b1.otherID, COUNT(*) AS cnt
     FROM
      @a AS a1
     INNER JOIN @b AS b1 ON b1.value = a1.value
     GROUP BY
      a1.pkid, b1.otherID
    ) SQ2 ON
     SQ2.pkid = SQ1.pkid AND
     SQ2.cnt = SQ1.cnt
INNER JOIN
    (
     SELECT
      b2.otherID, COUNT(*) AS cnt
     FROM
      @b AS b2
     GROUP BY
      b2.otherID
    ) SQ3 ON
     SQ3.otherID = SQ2.otherID AND
     SQ3.cnt = SQ1.cnt
Tom H.
A: 

As CQ says, a simple inner join is all you need.

Select * -- all columns but only from #a
from #a 
inner join #b 
on #a.value = #b.value -- only return matching rows
where #a.pkid  = 2
Dave Jackson
+1  A: 
-- Note, only works as long as no duplicate values are allowed in either table
DECLARE @validcomparisons TABLE (
    pkid INT,
    otherid INT,
    num INT
)

INSERT INTO @validcomparisons (pkid, otherid, num)
SELECT  a.pkid, b.otherid, A.cnt
FROM    (select pkid, count(*) as cnt FROM @a group by pkid) a
INNER JOIN  (select otherid, count(*) as cnt from @b group by otherid) b 
    ON b.cnt = a.cnt

DECLARE @comparison TABLE (
    pkid INT,
    otherid INT,
    same INT)

insert into @comparison(pkid, otherid, same)
SELECT a.pkid, b.otherid, count(*)
FROM    @a a
INNER JOIN  @b b
    ON a.value = b.value
GROUP BY    a.pkid, b.otherid

SELECT  COMP.PKID, COMP.OTHERID
FROM    @comparison comp
INNER JOIN  @validcomparisons val
    ON comp.pkid = val.pkid
    AND comp.otherid = val.otherid
    AND comp.same = val.num
Cervo
+2  A: 

The following query gives you the requested results:

select A.pkid, B.otherId
    from @a A, @b B 
    where A.value = B.value
    group by A.pkid, B.otherId
    having count(B.value) = (
     select count(*) from @b BB where B.otherId = BB.otherId)
Frans
A: 

To iterate the point further:

select a.*
from @a a 
inner join @b b on a.value = b.value

This will return all the values in @a that match @b

+1  A: 

I've added a few extra test cases. You can change your duplicate handling by changing the way you use distinct keywords in your aggregates. Basically, I'm getting a count of matches and comparing it to a count of required matches in each @a and @b.

declare @a table
(
    pkid int,
    value int
)

declare @b table
(
    otherID int,
    value int
)


insert into @a values (1, 1000)
insert into @a values (1, 1001)

insert into @a values (2, 1000)
insert into @a values (2, 1001)
insert into @a values (2, 1002)

insert into @a values (3, 1000)
insert into @a values (3, 1001)
insert into @a values (3, 1001)

insert into @a values (4, 1000)
insert into @a values (4, 1000)
insert into @a values (4, 1001)


insert into @b values (-1, 1000)
insert into @b values (-1, 1001)
insert into @b values (-1, 1002)

insert into @b values (-2, 1001)
insert into @b values (-2, 1002)

insert into @b values (-3, 1000)
insert into @b values (-3, 1001)
insert into @b values (-3, 1001)



SELECT Matches.pkid, Matches.otherId
FROM
(
    SELECT a.pkid, b.otherId, n = COUNT(*)
    FROM @a a
    INNER JOIN @b b
     ON a.Value = b.Value
    GROUP BY a.pkid, b.otherId
) AS Matches

INNER JOIN 
(
    SELECT
     pkid,
     n = COUNT(DISTINCT value)
    FROM @a
    GROUP BY pkid
) AS ACount
ON Matches.pkid = ACount.pkid

INNER JOIN
(
    SELECT
     otherId,
     n = COUNT(DISTINCT value)
    FROM @b
    GROUP BY otherId
) AS BCount
    ON Matches.otherId = BCount.otherId

WHERE Matches.n = ACount.n AND Matches.n = BCount.n
A: 

1) i assume that you don't have duplicate id

2) get the key with the same number of value

3) the row with the number of key value equal to the number of equal value is the target

I hope it's what you searched for (you don't search performance don't you ?)

declare @a table( pkid int, value int) declare @b table( otherID int, value int)

insert into @a values (1, 1000)

insert into @a values (1, 1001)

insert into @a values (2, 1000)

insert into @a values (2, 1001)

insert into @a values (2, 1002)

insert into @a values (3, 1000)

insert into @a values (3, 1001)

insert into @a values (4, 1000)

insert into @a values (4, 1001)

insert into @b values (-1, 1000)

insert into @b values (-1, 1001)

insert into @b values (-1, 1002)

insert into @b values (-2, 1001)

insert into @b values (-2, 1002)

insert into @b values (-3, 1000)

insert into @b values (-3, 1001)

select cntok.cntid1 as cntid1, cntok.cntid2 as cntid2

  from
 (select cnt.cnt, cnt.cntid1, cnt.cntid2 from
    (select acnt.cnt as cnt, acnt.cntid as cntid1, bcnt.cntid as cntid2 from
          (select count(pkid) as cnt, pkid as cntid from @a group by pkid)
           as acnt
                full join 
               (select count(otherID) as cnt, otherID as cntid from @b group by otherID)
                as bcnt
                   on  acnt.cnt = bcnt.cnt)
     as cnt
     where cntid1 is not null and cntid2 is not null)
   as cntok 
inner join 
(select count(1) as cnt, cnta.cntid1 as cntid1, cnta.cntid2 as cntid2
from
    (select cnt, cntid1, cntid2, a.value as value1 
     from
         (select cnt.cnt, cnt.cntid1, cnt.cntid2 from
            (select acnt.cnt as cnt, acnt.cntid as cntid1, bcnt.cntid as cntid2 from
                  (select count(pkid) as cnt, pkid as cntid from @a group by pkid)
                   as acnt
                        full join 
                       (select count(otherID) as cnt, otherID as cntid from @b group by otherID)
                        as bcnt
                           on  acnt.cnt = bcnt.cnt)
             as cnt
             where cntid1 is not null and cntid2 is not null)
         as cntok 
             inner join @a as a on a.pkid = cntok.cntid1)
      as cnta
         inner join

             (select cnt, cntid1, cntid2, b.value as value2 
             from
             (select cnt.cnt, cnt.cntid1, cnt.cntid2 from
                    (select acnt.cnt as cnt, acnt.cntid as cntid1, bcnt.cntid as cntid2 from
                          (select count(pkid) as cnt, pkid as cntid from @a group by pkid)
                           as acnt
                                full join 
                               (select count(otherID) as cnt, otherID as cntid from @b group by otherID)
                                as bcnt
                                   on  acnt.cnt = bcnt.cnt)
                     as cnt
                     where cntid1 is not null and cntid2 is not null)
                 as cntok 
                     inner join @b as b on b.otherid = cntok.cntid2)
               as cntb
               on cnta.cntid1 = cntb.cntid1 and cnta.cntid2 = cntb.cntid2 and cnta.value1 = cntb.value2
      group by cnta.cntid1, cnta.cntid2) 
   as cntequals
   on cntok.cnt = cntequals.cnt and cntok.cntid1 = cntequals.cntid1 and cntok.cntid2 = cntequals.cntid2
+1  A: 

How do I query for all the values in @a that completely match up with @b?

I'm afraid this definition is not quite perfectly clear. It seems from your additional example that you want all pairs of a.pkid, b.otherID for which every b.value for the given b.otherID is also an a.value for the given a.pkid.

In other words, you want the pkids in @a that have at least all the values for otherIDs in b. Extra values in @a appear to be okay. Again, this is reasoning based on your additional example, and the assumption that (1, -2) and (2, -2) would be valid results. In both of those cases, the a.value values for the given pkid are more than the b.value values for the given otherID.

So, with that in mind:

    select
    matches.pkid
    ,matches.otherID
from
(
    select 
     a.pkid
     ,b.otherID
     ,count(1) as cnt
    from @a a
    inner join @b b
     on b.value = a.value
    group by 
     a.pkid
     ,b.otherID
) as matches
inner join
(
    select
     otherID
     ,count(1) as cnt
    from @b
    group by otherID
) as b_counts
on b_counts.otherID = matches.otherID
where matches.cnt = b_counts.cnt
+4  A: 

This is more efficient (it uses TOP 1 instead of COUNT), and works with (-2, 1000):

SELECT  *
FROM    (
        SELECT  ab.pkid, ab.otherID,
                (
                SELECT  TOP 1 COALESCE(ai.value, bi.value)
                FROM    (
                        SELECT  *
                        FROM    @a aii
                        WHERE   aii.pkid = ab.pkid
                        ) ai
                FULL OUTER JOIN
                        (
                        SELECT  *
                        FROM    @b bii
                        WHERE   bii.otherID = ab.otherID
                        ) bi
                ON      ai.value = bi.value
                WHERE   ai.pkid IS NULL OR bi.otherID IS NULL
                ) unmatch
        FROM
                (
                SELECT  DISTINCT pkid, otherid
                FROM    @a a , @b b
                ) ab
        ) q
WHERE   unmatch IS NOT NULL
Quassnoi
I've run on similar problem, and decided to use your solution instead of James' (I think his solution is good but slick). +1 for you.
rafek