tags:

views:

25

answers:

4

I have two tables in my database:

test1 which looks like:

ID   pubmed_ID
-------------------
1    22
2    22

test2 looks like:

antigen    pubmed_ID
------------------
Hello      22
Bye        22

when i use the following select statment:

select * 
  from test1, test2 
 where test1.pubmed_ID = test2.pubmed_ID;

I get:

ID pubmed_ID antigen pubmed_ID
--------------------------------
1  22        Hello   22   
2  22        Hello   22   
1  22        Bye     22       
2  22        Bye     22

Why have the antigens been duplicated? When they only exist once in the test2 table?

The primary keys are as follows test1 = "ID" column and for test 2 both the "antigen" and "pubmed_ID" form a compund primary key

Am i missing something simple here?

+2  A: 

Because both are matches. Which do you expect it to pick?

All the rows have the same pubmed_ID. Thus, each row of test1 matches each row of test2, so there are four (2*2) rows in the joined table.

Matthew Flaschen
How would i select against the "double" matches?
Harpal
@Harpal, which of the 4 do you want?
Matthew Flaschen
Rows 1 and 4 please
Harpal
@Harpal, but why those? How should MySQL determine it?
Matthew Flaschen
Because ID "1" corresponds to Antigen "hello" and "2" corresponds to "bye"
Harpal
Then maybe you need to add an `ID` field to to `test2` so you can join on that. It's hard to tell if that makes sense. You haven't given much data or info about your schema.
Matthew Flaschen
Hmmm ok back to the drawing board for my schema, thanks for your help anyway
Harpal
+2  A: 

It's because two records in test2 match per record in test1, based on the join criteria.

If you want it to be more selective, you need to add criteria that filters how you'd like.

OMG Ponies
Failing that, SELECT DISTINCT should help.
Kalium
@Kalium: Based on the data in the example, DISTINCT wouldn't help because of the ID value - it's different per set of the other values.
OMG Ponies
Indeed it is. This is just straight data pollution, then.
Kalium
+1  A: 

Your join is operating as it should. You are matching pumed_ID which is the same for every row.

When the DBE grabs row 1 from table test1

ID   pubmed_ID
-------------------
1    22

And then matches to rows with the same pubmed_ID in table test2, both rows match...

antigen    pubmed_ID
------------------
Hello      22
Bye        22

The exact same thing happens with the second row from table test1, making the entire result set contain four rows.

Make sense?

Bobby B
A: 

It is simple, the test1.ID = 1 matches, through it's pubmed_ID both Hello and Bye, and the same for the other test1 record.

xav0989