tags:

views:

97

answers:

5

I am retrieving three different sets of data (or what should be "unique" rows). In total, I expect 3 different unique sets of rows because I have to complete different operations on each set of data. I am, however, retrieving more rows than there are in total in the table, meaning that I must be retrieving duplicate rows somewhere. Here is an example of my three sets of queries:

SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
     ON t2.ID = t1.ID
      AND t2.NAME = t1.NAME
      AND t2.ADDRESS <> t1.ADDRESS


SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
     ON t2.ID = t1.ID
      AND t2.NAME <> t1.NAME
      AND t2.ADDRESS <> t1.ADDRESS


SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
     ON t2.ID <> t1.ID
      AND t2.NAME = t1.NAME
      AND t2.ADDRESS <> t1.ADDRESS

As you can see, I am selecting (in order of queries)

  • Set of data where the id AND name match
  • Set of data where the id matches but the name does NOT
  • Set of data where the id does not match but name DOES

I am retrieving MORE rows than exist in T1 when adding up the number of results returned from all three queries which I don't think is logically possible, plus this means I must be duplicating rows (if it is logically possible) somewhere which prevents me from executing different commands against each set (since a row would have another command executed on it).

Can someone find where I'm going wrong here?

+1  A: 

Are you sure that NAME and ID are unique in both tables?

If not, you could have a situation, for example, where table 1 has this:

NAME: Fred ID: 1

and table2 has this:

NAME: Fred ID: 1

NAME: Fred ID: 2

In this case, the record in table1 will be returned by two of your queries: ID and NAME both match, and NAME matches but ID doesn't.

You might be able to narrow down the problem by intersecting each combination of two queries to find out what the duplicates are, e.g.:

SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID = t1.ID
                AND t2.NAME = t1.NAME
                AND t2.ADDRESS <> t1.ADDRESS
INTERSECT
SELECT DISTINCT t1.*
    FROM table1 t1
    INNER JOIN table2 t2
        ON t2.ID = t1.ID
                AND t2.NAME <> t1.NAME
                AND t2.ADDRESS <> t1.ADDRESS
Eric Rosenberger
Name and ID are NOT unique, in either table, and that is the exact case I found after using Intersect. Table 1 has a "unique" value matched, but table 2 has two possible returns for that. I have a unique value in each table (I'm basically meshing two similar tables) that I think I can use.
Organiccat
A: 

I think the last query could be the one fetching extra set of rows.

i.e. It is relying on Name matching in both tables (and not on ID)

shahkalpesh
+2  A: 

Consider if Name is not unique. If you have the following data:

Table 1                        Table 2
ID    Name      Address        ID    Name      Address
0     Jim Smith 1111 A St      0     Jim Smith 2222 A St
1     Jim Smith 2222 B St      1     Jim Smith 3333 C St

Then Query 1 gives you:

0     Jim Smith 1111 A St
1     Jim Smith 2222 B St

Because rows 1 & 2 in Table 1 match rows 1 & 2, respectively in Table 2.

Query 2 gives you nothing.

Query 3 gives you

0     Jim Smith 1111 A St
1     Jim Smith 2222 B St

Because row 1 in Table 1 matches row 2 in Table 2 and row 2 in Table 1 matches row 1 in Table 2. Thus you get 4 rows out of Table 1 when there are only 2 rows in it.

tvanfosson
Those addresses look like they match. Wouldn't there be zero results for all?
recursive
Cut/paste error. Fixed now.
tvanfosson
Ah, my error was in thinking that it would return only 1 row from Table 1, despite there being a match for another row as well with the same data in Table 2.
Organiccat
+1  A: 

Assuming that T2.ID has a unique constraint, it is still quite logically possible for this scenario to occur.If for every record in T1, there are two corresponding records in T2:

  1. Same name, same id, different address
  2. Same name, different id, different address

Then the same record for T1 can come up in the first and third query for example.

It is also possible to simultaneously get the same row in the second and third query.

If T2.ID is not guaranteed to be unique, then you could get the same row from T1 in all three queries.

recursive
A: 

To find the offending data (and help find your logic hole) I would recommend:

(caution pseudo-code)

Limit the results to just SELECT id FROM ....

  • UNION the result sets

  • COUNT(id)

  • GROUP BY id

  • HAVING count(id) > 1

This will show the records that match more than one sub-query.

Chris Nava